Using Python mechanize on websites that use DHTML, AJAX, etc.?

Question

So, let's say I'm trying to create something that replies to tweets of a certain "hashtag keyword" on twitter (for example "#FirstWorldProblems") I have a script that looks like this:

# apply settings, create a mechanize.Browser, etc.

login() # log into twitter

# at this point we've logged into twitter, now, we will perform navigate to their search page and run a search query:
br.open('http://twitter.com/search?q=' + hashtag)
print(br.response().read()) # print the response

So, what I have above is sort of an abbreviated version to quickly get to the spot giving me trouble.

I set up a browser, log into twitter, all done no problemo. But, then I run a search for the hashtag (using br.open) and then I print the response.

On Twitter, the "Reply" link only appears when you hover over a specific link and leads to "#" (because it opens a little pop-up thing where you can enter your reply), how would I click on the "Reply" link, because it doesn't show up in the response?

You probably want to use Twitter API: https://dev.twitter.com/docs/api — dmedvinsky, Oct 26 '11 at 06:03

score 2 · Accepted Answer · answered Oct 26 '11 at 11:38

2

If your problem is actually just accessing Twitter, dmedvinsky is probably right.

However, if you really want to be able to scrape websites (while allowing their javascript to run as it normally would..) you'll probably want something a bit more robust.

While it's a lot of baggage, I strongly urge you to grab Qt, PySide, and get familiar with QWebKit. You can drive a 'real' web browser from Python and get all the benefits (and problems;) one would expect. But, so far it's the best and cleanest method I've found to do what you're asking about.

answered Oct 26 '11 at 11:38

synthesizerpatel

27,321
5
74
91

You could also check out [Selenium](http://seleniumhq.org/): it drives a web browser directly, so if you can do something on a site with your normal browser (which I hope you can :), you can automate it to do it with Selenium. – jro Oct 26 '11 at 11:43
1

For what it's worth, Selenium requires a bit more effort, is more 'test' oriented, and can't be run headless. It's also written in Java with a not-terribly-powerful API. For instance, you can't get HTTP headers for a transferred page, or the HTTP result code. You can do a lot of fancy things driving WebKit with Python. And, I certiantly don't mean to poo-poo Selenium, it's very useful! – synthesizerpatel Oct 26 '11 at 11:49
1

Thanks for the response! It made me look into QWebKit with more detail... seems like a very nice alternative to Selenium indeed! – jro Oct 26 '11 at 12:11

Using Python mechanize on websites that use DHTML, AJAX, etc.?

1 Answers1