I'm trying to get some data with images from website(IMDB) using 'scrapy' package.
If there is a image_URL in div class, then i'm able to crawl data with movie poster. However, If not, my code doesn't work properly. It skipped some data associate with image.
I want to fix it like no image_URL then forget about the image and just crawl data.
How can I fix except part?
def parse(self, response) :
//some other lines
try:
poster_image_url =
response.xpath('//div[@class="poster"]/a/img/@src').extract()[0]
poster_image_url = [ poster_image_url.split("_V1_")[0] + "_V1_.jpg" ]
except:
poster_image_url = None
item['image_urls'] = poster_image_url
This is pipeline code ↓↓↓↓
class ImdbPipeline(object):
def process_item(self, item, spider):
return item
def get_media_requests(self, item, info):
for image_url in item['image_urls']:
yield scrapy.Request(image_url)