0

I am using Python and urllib2 with some good success. However, I have run into a web site that sets div properties via CSS. That means that the properties are set via JS and/or jQuery (I think!). My urllib2 results have an "empty" DIV that makes sense to a user because the JS sets the things like the size and background color. I tracked down the attributes and they are located in a CSS file (which I assume is dynamically generated server-side).

How can I use urllib2 to make sure that I pick up the full attribute set for the particular DIV?

My code is nothing special, I take it and then send it to BeautifulSoup

rootPage = urllib2.urlopen(someURL)
data = rootPage.read()
soup = BeautifulSoup(myHTML, "lxml")
specialDIV = soup.findAll("div",{"class":"special_div"})

After this code, the specialDIV object only has 1 attribute while the user (and CSS) actually sees 8 attributes on the DIV

FYI I fully realize I could be using twill, but I'm using urllib because of the authentication scheme. I'm hoping I can resolve this without having to move away from urllib

Unknown Coder
  • 6,625
  • 20
  • 79
  • 129
  • 1
    If it's dynamically loaded via JavaScript, you'll have to track down what it's loading and how it's generating those URLs. There is not enough information in the question for us to help you with that. – icktoofay Jan 19 '14 at 02:22
  • @icktoofay as far as I can tell, the page generates a dynamic CSS sheet that contains the attributes that it then applies to the DIV. Is that enough info??? – Unknown Coder Jan 19 '14 at 02:34
  • @JimBeam icktoofay is right. For more details check this post out http://stackoverflow.com/questions/12466900/why-urllib-urlopen-read-does-not-correspond-to-source-code – Anthony Kong Jan 19 '14 at 05:11

0 Answers0