I was trying to scrape "span" tag using BeautifulSoup. here's my code..
import urllib
from bs4 import BeautifulSoup
url="someurl"
res=urllib.urlopen(url)
html=res.read()
soup=BeautifulSoup(html,"html.parser")
soup.findAll("span")
But when I do so, for some specific web pages. it does n't list all the spans. It just shows limited no. of spans. but when I do
soup.prettify()
It contains all the spans.. What might be the reason? Am I missing out on something? Also some answers I found were to use headless browsers like "htmlunit". but I am not sure what they exactly are? Can I integrate them into my django project?
soup.prettify gives https://drive.google.com/file/d/0BxhTzDujWhPVTzdIS2VWd1pZcHM/view?usp=sharing
expected output of soup.findAll("span")
list of all the spans
output im getting
[<span class="ssc-ftpl ssc_ga_tag" data-gaa="Opened" data-gac="Footer" data-gal="Responsible Gambling" tabindex="0"> Responsible Gambling</span>, <span class="ssc-ftpl ssc_ga_tag" data-gaa="Opened" data-gac="Footer" data-gal="About Betfair" tabindex="0"> About Betfair</span>, <span class="ssc-ftpl ssc-ftls " tabindex="0">English - UK</span>, <span class="ssc-ftpl" tabindex="0">\xa9 \xae</span>]