22

I’m writing a little private app to automatically log into my internet banking every day, and download the latest transactions. I’m writing it as a Django app, so I’m working in Python.

My internet banking doesn’t seem to work without JavaScript — I think it uses JavaScript to assign a session ID of some sort. Fetching the sign-in page via httplib gives me a page telling me JavaScript’s required.

So, I’m now looking for libraries that fetch web pages, and execute the JavaScript on them. Pretty much headless browsers.

I’m fiddling about Selenium at the moment. I think it’ll do the job, although it is designed for testing web apps, so I was wondering if there was anything with similar capabilities designed for more general purposes than testing.

Any Python alternatives to Selenium for this sort of thing?

Paul D. Waite
  • 96,640
  • 56
  • 199
  • 270

6 Answers6

10

since you use selenium I think you have already installed firefox. if so get an extension like firebug or tamper data and see what http-requests the javascript code will do while logging in.

if you have the url and the parameters needed you can easily program a python client with httplib or urllib2.

in firebug you find the requested urls under "NET". tamper data will be self descriptive. ;-)

  • 2
    That is indeed what I ended up doing — turns out in this case, the JavaScript wasn’t doing anything complicated. That said, after a weekend of screen-scraping, I *really* hope they don’t upgrade the website front end, at least without also providing an API. – Paul D. Waite Feb 10 '10 at 19:06
6

You can use Pywebkitgtk. There is a nice tutorial here.

Alternatively, you can use Beautiful Soup to get the page contents and something like python-spidermonkey to run the scripts.

Paul D. Waite
  • 96,640
  • 56
  • 199
  • 270
jbochi
  • 28,816
  • 16
  • 73
  • 90
4

I think a good match for your problem is Twill: a simple scripting language for Web browsing.

An other one to check is Windmill (a kind of Selenium but all written in Python).

Etienne
  • 12,440
  • 5
  • 44
  • 50
4

You can also use Spynner, it allows programmatic web browsing.

webaholik
  • 1,619
  • 1
  • 19
  • 30
Steven Matthews
  • 9,705
  • 45
  • 126
  • 232
1

Looks like QtWebKit is another option.

Paul D. Waite
  • 96,640
  • 56
  • 199
  • 270
-2

Since BeautifulSoup is no longer being actively developed I would recommend lxml since it does all the things that BeautifulSoup can do and a lot more.

AutomatedTester
  • 22,188
  • 7
  • 49
  • 62