1

I want to scrape the images (they are in .svg format) from this website https://endoca.com/ ) for the images I want to download. However, when I download the files, they are downloaded but shows as invalid images.

Can someone here help me if this problem?****

import requests
from bs4 import BeautifulSoup
from urllib import request
 import xlwt


url="https://www.endoca.com/en-in"

headers = {
'Accept-Encoding': 'gzip, deflate, sdch',
'Accept-Language': 'en-US,en;q=0.8',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/39.0.2171.95 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Referer': 'http://www.wikipedia.org/',
'Connection': 'keep-alive',
}

 r=requests.get(url=url,headers=headers)
soup=BeautifulSoup(r.text,'html.parser')
i=1
for img in soup.findAll ('img'):
i=i+1
img_temp=img.get('src')
if img_temp[:1]=='/':
    image_path='https://www.endoca.com/en-in'+img_temp
else:
    image_path=img_temp

    print(image_path)


if '.svg' in image_path:
    with open("{}.png".format(i),'wb') as f:
        f.write(requests.get(url=image_path).content)
        print(image_path)

else:
    pass

1 Answers1

0

To download images use shutil

import shutil

then

r = requests.get(image_path, stream = True)
if r.status_code == 200:
    r.raw.decode_content = True
    with open(f'image {i}.svg','wb') as f:
        shutil.copyfileobj(r.raw, f)

In your case:

if '.svg' in image_path:
    r = requests.get(image_path, stream = True)
    if r.status_code == 200:
        r.raw.decode_content = True
        with open(f'image {i}.svg','wb') as f:
            shutil.copyfileobj(r.raw, f)

Credits: Chaitanya Baweja Read more ways here

xaander1
  • 1,064
  • 2
  • 12
  • 40