2

I am looking for a module that will capture all of the data which is displayed in a browser (like Firefox). It would need to capture all CSS/JS/AJAX data. I have tried to use LWP::UserAgent which is some how not capturing all of the data.

If you want to look the web page I am looking at is:

http://finance.yahoo.com/q?s=SAPE&ql=1

You can see that there is a horizontal bar under their menu bar (Home, Investing, News, Personal Finance, etc) that contains date and time information for example:

Wed, Feb 6, 2013, 8:10pm EST - US Markets are closed

This can be seen with any browser, however when Perl fetches the webpage the date, time, and if the markets are open or closed are not in the captured data.

Do I need to use Wireshark to sniff out what I need, or is there a module that will duplicate a browser and capture this data, or is there a better way?

I thought LWP::UserAgent captures all data but evidently I am wrong..

Thanks.

DocMax
  • 12,094
  • 7
  • 44
  • 44
John
  • 135
  • 2
  • 8

1 Answers1

1

If you take "view source" of the page, this is mainly what LWP::UserAgent sees. To get a page that contains the dynamically loaded ajax data, menus built based on javascript etc. you need to load the page into either a web browser, or node.js or phantomJs or similar tools, that acually do run the javascripts and build the page as you see it. Then use thier DOM model to look for relevant data (e.g. use jQuery).

FtLie
  • 773
  • 4
  • 14