2

The following takes place in a Python 3 virtual environment.

I just authored a little package that requires numpy. So, in setup.py, I wrote install_requires=['numpy']. I ran python3 setup.py install, and it took something like two minutes -- I got the full screen dump of logs, warnings, and configurations that normally comes with a numpy installation.

Then, I created a new virtual environment, and this time simply wrote pip3 install numpy -- which took only a few seconds -- and then ran python3 setup.py install, and I was done almost immediately.

What's the difference between the two, and why was pip3 install numpy so much faster? Should I thus include a requirements.txt just so people can pip-install the requirements rather than using setuptools?


Note that when I wrote pip3 install numpy, I got the following:

Collecting numpy
  Using cached numpy-1.12.0-cp36-cp36m-manylinux1_x86_64.whl
Installing collected packages: numpy
Successfully installed numpy-1.12.0

Is it possible that this was so much faster because the numpy wheel was already cached?

Newb
  • 2,810
  • 3
  • 21
  • 35
  • Using **cached** numpy? – McGrady Feb 14 '17 at 11:17
  • @McGrady Yeah, I'm assuming that this has the wheel cached, not the actual installation itself. When I install numpy using setuptools, only a few seconds are spent on the download -- the rest of the time is spent on the installation. – Newb Feb 14 '17 at 11:20
  • If you try to install your package in a new venv, is it now faster than it was the first time? – Eric Feb 14 '17 at 12:06

1 Answers1

3

pip install uses wheel packages, which were designed partly with the purpose of speeding up the installation process.

The Rationale section of PEP 427, which introduced the wheel format, states:

Python needs a package format that is easier to install than sdist. Python's sdist packages are defined by and require the distutils and setuptools build systems, running arbitrary code to build-and-install, and re-compile, code just so it can be installed into a new virtualenv. This system of conflating build-install is slow, hard to maintain, and hinders innovation in both build systems and installers.

Wheel attempts to remedy these problems by providing a simpler interface between the build system and the installer. The wheel binary package format frees installers from having to know about the build system, saves time by amortizing compile time over many installations, and removes the need to install a build system in the target environment.

Installing from a wheel is faster since it is a Built Distribution format:

Built Distribution

A Distribution format containing files and metadata that only need to be moved to the correct location on the target system, to be installed. Wheel is such a format, whereas distutil’s Source Distribution is not, in that it requires a build step before it can be installed. This format does not imply that python files have to be precompiled (Wheel intentionally does not include compiled python files).

Since numpy's source distribution contains significant amount of C code, compiling it takes noticeable time, which you observed when you installed it via bare setuptools. pip avoided the compilation of C code since the wheel came with binary code (already compiled for your system).

Leon
  • 31,443
  • 4
  • 72
  • 97