I am using Python and urllib2 with some good success. However, I have run into a web site that sets div properties via CSS. That means that the properties are set via JS and/or jQuery (I think!). My urllib2 results have an "empty" DIV that makes sense to a user because the JS sets the things like the size and background color. I tracked down the attributes and they are located in a CSS file (which I assume is dynamically generated server-side).
How can I use urllib2 to make sure that I pick up the full attribute set for the particular DIV?
My code is nothing special, I take it and then send it to BeautifulSoup
rootPage = urllib2.urlopen(someURL)
data = rootPage.read()
soup = BeautifulSoup(myHTML, "lxml")
specialDIV = soup.findAll("div",{"class":"special_div"})
After this code, the specialDIV
object only has 1 attribute while the user (and CSS) actually sees 8 attributes on the DIV
FYI I fully realize I could be using twill, but I'm using urllib because of the authentication scheme. I'm hoping I can resolve this without having to move away from urllib