4

I am trying to convert a Scrapy script to an EXE file. The main.py file looks like this:

from scrapy.crawler import CrawlerProcess
from amazon.spiders.amazon_scraper import Spider

spider = Spider()
process = CrawlerProcess({
    'FEED_FORMAT': 'csv',
    'FEED_URI': 'data.csv',
    'DOWNLOAD_DELAY': 3,
    'RANDOMIZE_DOWNLOAD_DELAY': True,
    'ROTATING_PROXY_LIST_PATH': 'proxies.txt',
    'USER_AGENT_LIST': 'useragents.txt',
    'DOWNLOADER_MIDDLEWARES' :
    {
        'rotating_proxies.middlewares.RotatingProxyMiddleware': 610,
        'rotating_proxies.middlewares.BanDetectionMiddleware': 620,
        'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
        'random_useragent.RandomUserAgentMiddleware': 400
    }
})

process.crawl(spider)
process.start() # The script will block here until the crawling is finished

The Scrapy script looks like any other. I am using pyinstaller.exe --onefile main.py to convert it to an EXE file. When I try to open the main.exe file inside the dist folder it starts outputting errors:

FileNotFoundError: [Errno 2] No such file or directory: '...\\scrapy\\VERSION'

I can fix it by creating a scrapy folder inside the dist folder and uploading a VERSION file from lib/site-packages/scrapy. After that, many other errors occur but I can fix them by uploading some scrapy libraries.

In the end, it starts outputting an error:

ModuleNotFoundError: No module named 'email.mime'

I don’t even know what it means. I have never seen it.

I am using:

  • Python 3.6.5
  • Scrapy 1.5.0
  • PyInstaller 3.3.1
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131

1 Answers1

3

I had the same situation.
Instead of trying to make PyInstaller count this file (I failed all my attempts to do it), I decided to check and change some part of Scrapy code in order to avoid this error.

I noticed that there is only one place where the \scrapy\VERSION file is used—\scrapy\init.py
I decided to hardcode that value from scrapy\version by changing scrapy_init_.py:

#import pkgutil
__version__ = "1.5.0" #pkgutil.get_data(__package__, 'VERSION').decode('ascii').strip()
version_info = tuple(int(v) if v.isdigit() else v
                     for v in __version__.split('.'))
#del pkgutil

After this change, there isn't any need to store the version in an external file. As there isn't any reference to \scrapy\version file, that error will not occur.

After that, I had the same FileNotFoundError: [Errno 2] with \scrapy\mime.types file.
There is the same situation with file \scrapy\mime.types. It is used only in file \scrapy\responsetypes.py.

...
#from pkgutil import get_data
...
    def __init__(self):
        self.classes = {}
        self.mimetypes = MimeTypes()
        #mimedata = get_data('scrapy', 'mime.types').decode('utf8')
        mimedata = """
        Copypaste all 750 lines of \scrapy\mime.types here
"""
        self.mimetypes.readfp(StringIO(mimedata))
        for mimetype, cls in six.iteritems(self.CLASSES):
            self.classes[mimetype] = load_object(cls)

This change resolved FileNotFoundError: [Errno 2] with \scrapy\mime.types file. I agree that hardcoding 750 lines of text into Python code is not the best decision.

After that, I started to receive ModuleNotFoundError: No module named scrapy.spiderloader. I added "scrapy.spiderloader" into the hidden imports parameter of pyinstaller.

Next Issue: ModuleNotFoundError: No module named scrapy.statscollectors.

The final version of the PyInstaller command for my Scrapy script consists of 46 hidden imports. After that, I received a working EXE file.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Georgiy
  • 3,158
  • 1
  • 6
  • 18