0

I am trying to download an image with Scrapy, for testing purposes, and I have been struggling to figure out this error all day.

Before explaining further, here's my code to reference:

[Main scraper file, 'sample.py']:

import scrapy
from ..items import MyspiderItem

url= 'http://example.webscraping.com/places/static/images/flags/af.png'
class SampleSpider(scrapy.Spider):
    name = 'sample'
    start_urls = [url]

    def parse(self, response):
        item = MyspiderItem()
        item['image_urls'] = url
        return item

[settings.py]:

BOT_NAME = 'mySpider'
SPIDER_MODULES = ['mySpider.spiders']
NEWSPIDER_MODULE = 'mySpider.spiders'
ITEM_PIPELINES = {'scrapy.pipelines.images.ImagesPipeline': 1}
IMAGES_STORE = 'E:/Chris/img'

[items.py]:

import scrapy

class MyspiderItem(scrapy.Item):
    images = scrapy.Field()
    image_urls = scrapy.Field()

[all other files left as default]

First, after running this the first time, I got the error message:

(...)
  File "c:\users\chris\appdata\local\programs\python\python38-32\lib\site-packages\PIL\Image.py", line 93, in <module>
    from . import _imaging as core
ImportError: cannot import name '_imaging' from 'PIL' (c:\users\chris\appdata\local\programs\python\python38-32\lib\site-packages\PIL\__init__.py)

I read up on a few online posts about this issue and tried what was mentioned.

  1. I uninstalled Pillow and PIL, then reinstalled Pillow. Same error.
  2. I read through https://pillow.readthedocs.io/en/5.1.x/installation.html#warnings and figured I could go into PIL\Image.py and change from . import _imaging as core to from PIL.Image import core as _imaging(also changing all 'core' to '_imaging'). This time, still an error, but a new one:
(...)
  File "c:\users\chris\appdata\local\programs\python\python38-32\lib\site-packages\scrapy\pipelines\images.py", line 10, in <module>
    from PIL import Image
  File "c:\users\chris\appdata\local\programs\python\python38-32\lib\site-packages\PIL\Image.py", line 93, in <module>
    from PIL.Image import core as _imaging
ImportError: cannot import name 'core' from partially initialized module 'PIL.Image' (most likely due to a circular import) (c:\users\chris\appdata\local\programs\python\python38-32\lib\site-packages\PIL\Image.py)

At this point I cannot seem to fix this error even after going through all other similar stackoverflow/github questions. Any help would be appreciated. I have a feeling it is either vscode path related or maybe Scrapy does not work with most up to date Pillow/PIL library. I am not using a virtual environment btw. Perhaps I should? It's worth noting that I've been using Scrapy for various other tasks (not image downloading) recently and have had no issues.

Chris4542
  • 35
  • 7

1 Answers1

0

SOLUTION:

I started a virtual environment and installed Scrapy and Pillow inside of that environment, and it runs perfectly!

#after setting up virtualenv and activating venv...

(venv) $ pip3 install Scrapy
(venv) $ pip3 install Pillow

My spider needs major work, but the error is fixed! Hope this helps someone. Virtual environments are definitely the way to go with all Python projects.

To other newbies like myself, virtual environments solve a lot of weird bugs because a lot of times you will have packages installed from an earlier date that are now outdated and are not compatible with newer, related packages (but error codes will not always tell you that). So setting up virtualenv and starting with a clean slate should be one of the first things you do when debugging..otherwise just always start off using a virtual environment. :)

Chris4542
  • 35
  • 7