3

im trying to get all the Images from certain url using python.

So the usage of beautiful soup is strait Forward, but i'm facing the problem, that not all img tags are printed in the console. A closer look to the desired HTML file shows that the missing Images are coming from Angular, because they have a data-ng-src tag.

Is there any way to tell soup to wait until all scripts have finished? Or is there a nother way to detect all img tags?

My code so far:

import urllib2
from BeautifulSoup import BeautifulSoup

page = BeautifulSoup(urllib2.urlopen(url))
allImgs = imgs = page.findAll('img')
print allImgs
gismo
  • 31
  • 1
  • 4
  • 1
    Possible duplicate of [scrape html generated by javascript with python](http://stackoverflow.com/questions/2148493/scrape-html-generated-by-javascript-with-python) – Yevhen Kuzmovych Jan 13 '17 at 19:53

2 Answers2

1

Images are not inserted in HTML Page they are linked to it. And for things that need some wait/pause time I would rather use Selenium Web Driver. I think Beautiful Soup is reading page all at once. I think about it as a wrapper for daunting chores of parsing files, but not as a tool to interact with page.

dobhareach
  • 176
  • 3
  • 6
0

You can try using selenium. Though this library is used for automation testing, this has much enriched functions than BeautifulSoup

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

url ='http://example.com/'
driver = webdriver.Firefox()
driver.get(url)

delay = 5 # seconds

try:
    WebDriverWait(driver, delay).until(EC.presence_of_element_located(driver.find_elements_by_xpath('..//elementid')))
    print "Page is ready!"
    for image in driver.find_elements_by_xpath('..//img[@src]'):
        print image.get_attribute('src')
except TimeoutException:
    print "Couldn't load page"

Also read the following post; talks about dynamically loaded page using JS
https://stackoverflow.com/a/11460633/6626530

Community
  • 1
  • 1
Shijo
  • 9,313
  • 3
  • 19
  • 31