3

I'm currently using urllib2 and BeautifulSoup to open and parse html data. However I've ran into a problem with a site that uses javascript to load the images after the page has been rendered (I'm trying to find the image source for a certain image on the page).

I'm thinking Twill could be a solution, and am trying to open the page and use a regular expression with 'find' to return the html string I'm looking for. I'm having some trouble getting this to work though, and can't seem to find any documentation or examples on how to use regular expressions with Twill.

Any help or advice on how to do this or solve this problem in general would be much appreciated.

rolling stone
  • 12,668
  • 9
  • 45
  • 63

2 Answers2

0

I'd rather user CSS selectors or "real" regexps on page source. Twill is AFAIK not being worked on. Have you tried BS or PyQuery with CSS selectors?

starenka
  • 580
  • 3
  • 9
0

Twill does not work with javascript (see http://twill.idyll.org/browsing.html)

use webdriver if you want to handle javascript

amadain
  • 2,724
  • 4
  • 37
  • 58