4

I use virtualenv through pew (which I think is a fantastic tool), but I noticed something strange.

I have scipy system-side installed:

    7,7 MiB [##########] /sparse
    5,1 MiB [######    ] /special
    5,1 MiB [######    ] /stats
    5,0 MiB [######    ] /linalg
    3,5 MiB [####      ] /spatial
    3,0 MiB [###       ] /optimize
    2,5 MiB [###       ] /signal
    2,3 MiB [###       ] /interpolate
    2,3 MiB [##        ] /misc
    2,2 MiB [##        ] /io
    1,5 MiB [##        ] /integrate
    1,3 MiB [#         ] /ndimage
    1,0 MiB [#         ] /fftpack
  744,0 KiB [          ] /cluster
  512,0 KiB [          ] /odr
  464,0 KiB [          ] /constants
  252,0 KiB [          ] /_lib
   44,0 KiB [          ] /_build_utils
   36,0 KiB [          ] /__pycache__
   24,0 KiB [          ]  HACKING.rst.txt
   12,0 KiB [          ]  THANKS.txt
    8,0 KiB [          ]  INSTALL.rst.txt
    4,0 KiB [          ]  __init__.py
    4,0 KiB [          ]  __config__.py
    4,0 KiB [          ]  LICENSE.txt
    4,0 KiB [          ]  setup.py
    4,0 KiB [          ]  BENTO_BUILD.txt
    4,0 KiB [          ]  version.py
    4,0 KiB [          ]  linalg.pxd

And this is scipy virtualenv-side installed (same scipy version):

51,0 MiB [##########] /sparse
   37,6 MiB [#######   ] /.libs
   12,9 MiB [##        ] /linalg
   10,6 MiB [##        ] /spatial
    9,7 MiB [#         ] /special
    6,0 MiB [#         ] /interpolate
    5,9 MiB [#         ] /stats
    5,1 MiB [#         ] /optimize
    4,2 MiB [          ] /signal
    3,2 MiB [          ] /io
    3,0 MiB [          ] /integrate
    3,0 MiB [          ] /ndimage
    2,3 MiB [          ] /misc
    2,1 MiB [          ] /cluster
    1,7 MiB [          ] /fftpack
  884,0 KiB [          ] /odr
  328,0 KiB [          ] /constants
  204,0 KiB [          ] /_lib
   32,0 KiB [          ] /_build_utils
   24,0 KiB [          ]  HACKING.rst.txt
   20,0 KiB [          ] /__pycache__
   12,0 KiB [          ]  THANKS.txt
    8,0 KiB [          ]  INSTALL.rst.txt
    4,0 KiB [          ]  __init__.py
    4,0 KiB [          ]  LICENSE.txt
    4,0 KiB [          ]  setup.py
    4,0 KiB [          ]  __config__.py
    4,0 KiB [          ]  BENTO_BUILD.txt
    4,0 KiB [          ]  version.py
    4,0 KiB [          ]  pip-delete-this-directory.txt
    4,0 KiB [          ]  linalg.pxd

Needless to say there is a huge size difference. It would normally not bother me much, but I'm trying to bundle an executable file with pyinstaller and the resulting executable file is unrealistically too big.

Can someone explain such a difference ? It is not specific to scipy, I also see it for numpy, and maybe for other libraries.

EDIT:

The files inside the directories have different sizes:

System-wide:

3,1 MiB [##########]  _sparsetools.cpython-35m-x86_64-linux-gnu.so

Virtualenv-wide:

38,5 MiB [##########]  _sparsetools.cpython-35m-x86_64-linux-gnu.so
JPFrancoia
  • 4,866
  • 10
  • 43
  • 73
  • 2
    How did you install the package system-wide? Are there files in the system-wide install that are missing from your local one? – Blender Aug 28 '16 at 22:00
  • Oh, I get it. Sytem-side, scipy is installed from my package manager (I'm on Manjaro Linux). But why is there such a difference anyway ? – JPFrancoia Aug 28 '16 at 22:05
  • Where are You looking for virtualenv-wide package? – Tomasz Jakub Rup Aug 28 '16 at 22:07
  • in ```~/.local/share/virtualenvs/cb/lib/python3.5/site-packages/scipy``` – JPFrancoia Aug 28 '16 at 22:08
  • I downloaded the Scipy Python package and it was 43mb, which is a little smaller than your local package. The difference probably lies in the `pyc` files and such. You don't have write access to the system-wide `site-packages`, so the Python bytecode files can't be created. – Blender Aug 28 '16 at 22:08
  • I think it'd just be easier if you compared the contents of the directories yourself, I'm sure there are files in one that aren't in the other. – Blender Aug 28 '16 at 22:10
  • Nope some files are different, see my edit. – JPFrancoia Aug 28 '16 at 22:13
  • Probably system-wide package is compiled with shared libraries (Cython modules) and virtualenv-wide package with static libraries. – Tomasz Jakub Rup Aug 28 '16 at 22:25
  • Try to install system-wide package with pip: `sudo pip install scipy` and look at the file sizes. Should be similar with virtualenv-wide version. – Tomasz Jakub Rup Aug 28 '16 at 22:28

1 Answers1

5

The shared library files in the Python wheels distributed for Scipy aren't stripped, so they are bigger than what your package manager installs:

$ file _sparsetools.cpython-35m-x86_64-linux-gnu.so
_sparsetools.cpython-35m-x86_64-linux-gnu.so: ELF 64-bit LSB  shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=ed7b8e00c558f234620423315fa9b53274393e72, not stripped
$ du -h _sparsetools.cpython-35m-x86_64-linux-gnu.so
39M     _sparsetools.cpython-35m-x86_64-linux-gnu.so

If you strip it, the file size shrinks:

$ strip _sparsetools.cpython-35m-x86_64-linux-gnu.so
$ file _sparsetools.cpython-35m-x86_64-linux-gnu.so
_sparsetools.cpython-35m-x86_64-linux-gnu.so: ELF 64-bit LSB  shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=ed7b8e00c558f234620423315fa9b53274393e72, stripped
$ du -h _sparsetools.cpython-35m-x86_64-linux-gnu.so
3.7M    _sparsetools.cpython-35m-x86_64-linux-gnu.so

PyInstaller can do this for you with the --strip flag.

Blender
  • 289,723
  • 53
  • 439
  • 496
  • However, it seems it doesn't always work. It works for *some* libraries of scipy, but not all. Also, it doesn't work for PIL or numpy. Libraries installed system-wide are much smaller. – JPFrancoia Aug 29 '16 at 11:53
  • @Rififi: The virtualenv libraries might be statically linked, in which case the dynamically linked system libraries will have to pull in their dependencies anyways into the final package. Try to compare the sizes of the final executable created from a virtualenv and from the system-wide installs (both with `--strip`). – Blender Aug 29 '16 at 15:00