1

I'd like to collect plane ticket prices from a certain website, for many dates and destinations. I can specify source, destination and dates on the URL, but the website fetches the data using AJAX, so the prices aren't readily available on the page's response. In such case I could use any programming language to get the data.

I figured this task would be better accomplished using the web browser to load each URL one after another, letting it render the page, and then I'd just look for the desired tag (using CSS selectors or JS, I guess) and save it to some file or log, and move to the next URL. Later I could review the data and find the best prices.

But I unfortunately couldn't find any browser extension/add-on to do this task (any Linux browser is fine, Firefox and Chrome the more likely). I'm already familiar with GreaseMonkey, but it is not the kind of task he's designed to do, but I imagine it would be a similar tool or operate in a similar manner.

Does anybody know some tool that I can use for this task? Other approaches are welcome too!

1 Answers1

0

I would use cURL, check the source of pages to to see the post / get data passed to pages, and just build your own gui to display the data. You could run it off your own web server with php curl very easily and quickly.

jett
  • 1,276
  • 2
  • 14
  • 34
  • I looked at the source, it does not contain the data I need. Using Firebug I checked it is a JSON response from an AJAX call. But I won't be able to perform such calls myself as I will be blocked by 'same domain policy'. That's why I thought an add-on would be the way to go. – user1775560 Oct 25 '12 at 23:19
  • @user1775560 JSON can be pulled with cURL as well, or with language specific json downloader.. even javascript has no same domain policy on json requests. The rest you would have to extract in various ways depending on how it is set. – jett Oct 25 '12 at 23:28
  • I'm sorry, but I still can't see how cURL will be able to make the request. As I see, same origin is a deal-breaker. As a side note, the approach I'm trying now is pulling the data using GreaseMonkey, and sending it to a local Tomcat server that will save the incoming data and reply the next URL to be loaded by GreaseMonkey. Terrible, still not sure it will work, but if it does and ASAP, then so be it =P – user1775560 Oct 25 '12 at 23:55
  • Same origin only applies to default security browsers. Good luck with your approach. If it does not work I encourage you to look into my suggestions. Cheers. – jett Oct 26 '12 at 00:10
  • 1
    Now that you mention it, I feel stupid. Never realized same origin was browser policy, not the server. Plus, the server seems to have no issues with remote calls (not even cookie data check), I could call it with cURL without any issues. Thanks, I'd upvote you now if I could. – user1775560 Oct 26 '12 at 00:55