16

How to download a distribution, possibly sdist, without potentially executing a setup.py file (that may contain malicious code)?

I don't want to recursively get the dependencies, only download one file for the specified distribution. Attempt that doesn't work:

pip download --no-deps mydist

Here is a reproducible example that demonstrates the setup.py is still executed in the above case:

$ docker run --rm -it python:3.8-alpine sh
/ # pip --version
pip 20.0.2 from /usr/local/lib/python3.8/site-packages/pip (python 3.8)
/ # pip download --no-deps suds
Collecting suds
  Downloading suds-0.4.tar.gz (104 kB)
     |████████████████████████████████| 104 kB 13.4 MB/s 
    ERROR: Command errored out with exit status 1:
     command: /usr/local/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-download-yqfdz35d/suds/setup.py'"'"'; __file__='"'"'/tmp/pip-download-yqfdz35d/suds/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-download-yqfdz35d/suds/pip-egg-info
         cwd: /tmp/pip-download-yqfdz35d/suds/
    Complete output (7 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-download-yqfdz35d/suds/setup.py", line 20, in <module>
        import suds
      File "/tmp/pip-download-yqfdz35d/suds/suds/__init__.py", line 154, in <module>
        import client
    ModuleNotFoundError: No module named 'client'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

I cannot use --no-binary option, because I don't want to exclude source distributions. I just want to avoid executing their source code.

wim
  • 338,267
  • 99
  • 616
  • 750
  • No obvious solution comes to mind. But one could maybe get close enough, depending what the final goal is. For example I wonder how far one could get by doing things like making sure _setuptools_ is not installed, monkey-patching/overriding _distutils_ so that it's harmless, and so on. Otherwise I would experiment with using `pip download --no-deps --require-hashes --requirement <(echo 'suds==0.4 --hash=sha256:')` and providing purposefully wrong hashes, so that it all stops after the downloads; but it is not easy to find where the distribution ends up (maybe use the `--cache-dir` option?). – sinoroc Feb 20 '20 at 12:04

2 Answers2

8

I've been digging into pip, and sadly the code there is pretty convoluted. It seems that currently there is no way to do that, and according to the link provided by @doctaphred there are no plans to make progress in that direction.

The next step depends on your situation; If, for example, you need this "package downloader" for production, I'd suggest you write your own pypi client. It would be very simple to write and you could make it much faster and simpler than pip by optimizing it for your needs. To do that you could try to use some of the existing code in pip, but I think it will probably be pretty hard (after seeing that code).

Otherwise, I'd consider quicker, hackier methods to get the job done. The first solution that comes to mind is just stopping pip whenever it tries to run the egg_info command. To do that you can patch pip's code at runtime using various methods. My favorite is using a usercutomize file.

For example, create a patch file with the following content and place it in a directory of your choosing:

/pypatches/pip_pure_download/usercustomize.py:

from pip._internal.req.req_install import InstallRequirement

print('Applying pure download patch!')

def override_run_egg_info(*args, **kwargs):
    raise KeyboardInterrupt # Joke's on you, evil hackers! :P

InstallRequirement.run_egg_info = override_run_egg_info

Now to apply the patch to a python execution, just add the patch's directory to the PYTHONPATH, for example:

PYTHONPATH=/pypatches/pip_pure_download:$PYTHONPATH pip download --no-deps suds
kmaork
  • 5,722
  • 2
  • 23
  • 40
  • Well, this is a disgusting monkeypatch, and since `pip._internal.req.req_install` is private implementation there's no guarantee it will work across pip versions. But nobody came up with anything better, so I suppose you win by default! I disagree that creating your own PyPI client "would be very simple to write", if you consider [the grammar for parsing environment markers and version specifications](https://www.python.org/dev/peps/pep-0508/#complete-grammar) correctly, that looks pretty involved to me. – wim Feb 26 '20 at 02:27
  • 1
    Haha, you are right, but I guess horrible code is like food at a party - you have to make some yourself if you wanna enjoy everybody else's :) Regarding a new pypi client, I imagined you might not need the full capabilities of pip, and that maybe most of the complicated things will be enclosed in some code you could copy. But I haven't checked, you might be right... – kmaork Feb 27 '20 at 07:07
  • This is horrible, and I love it. Thanks for doing the necessary digging! – doctaphred Feb 28 '20 at 20:13
  • I have to unaccept the answer, because this has stopped working in more recent versions of pip. – wim Aug 06 '20 at 23:13
1

This doesn't seem to be possible as of pip 19.3.1 :(

See https://github.com/pypa/pip/issues/1884

doctaphred
  • 2,504
  • 1
  • 23
  • 26