Querying web pages with Python

Question

I am learning web programming with Python, and one of the exercises I am working on is the following: I am writing a Python program to query the website "orbitz.com" and return the lowest airfare. The departure and arrival cities and dates are used to construct the URL.

I am doing this using the urlopen command, as follows:

(search_str contains the URL)

from lxml.html import parse

from urllib2 import urlopen

parsed = parse(urlopen(search_str))

doc = parsed.getroot()

links = doc.findall('.//a')

the_link = (links[j].text_content()).strip()

The idea is to retrieve all the links from the query results and search for strings such as "Delta", "United" etc, and read off the dollar amount next to the links.

It worked successfully until today - It looks like orbitz.com has changed their output page. Now, when you enter the travel details on the orbitz.com website, there appears a page showing a wheel saying "looking up itineraries" or something to that effect. This is just a filler page and contains no real information. After a few seconds, the real results page is displayed. Unfortunately, the Python code return the links for the filler page each time, and I never obtain the real results.

How can I get around this? I am a relative beginner to web programming, so any help is greatly appreciated.

score 0 · Answer 1 · answered Oct 04 '13 at 01:48

0

This kind of things is normal in the world of crawlers.

What you need to do is figure out what url it is redirecting to after the "itinerary page" and you hit that url directly from your script.

Then figure out if they have changed the final search results page too, if so modify your script to accommodate those changes.

answered Oct 04 '13 at 01:48

Srikar Appalaraju

71,928
54
216
264

Thanks. However, I'm still stuck. It looks like the URL of the filler is exactly the same as the URL of the results page. Or perhaps the true URL is not being displayed. Could you tell me how to obtain the URL if it doesn't show up in the browser? – Aravind Oct 04 '13 at 02:27
unless you tell me whats the website, what are the url's I am not in a position to help... – Srikar Appalaraju Oct 04 '13 at 07:23

Querying web pages with Python

1 Answers1