4

The Need:

  • Support several hundred Python developers and/or prod servers running Python code in a highly restrictive environment.
  • Be able to provide any compatible module found in PyPi.org that a developer needs.

Environment:

  • No external access.
  • Internal network available.
  • Support multiple platforms (Windows, Linux, Apple)
  • Good chunk of developers and/or prod servers do not have access to compiling tools.
  • At minimum, supports latest Python 2.7 and Python 3.x

The Ask:

  • How does one provide support for the distribution of installing Python modules?
  • How does one deal with those Python modules requiring compilation? Remember, many boxes will not have the compile tools available.

Def appreciate solutions based on similar real world experiences.

Assumptions:

  • Assume a magical process exists which authorizes modules to be pulled into the internal network for distribution.
  • Not that Anaconda can’t be a part of the answer, just be sure to address how you would work around PyPi.org packages not found there.

Clarifications:

  • Docker containers are allowed.
Ifrit
  • 143
  • 6
  • I don't have the rep here to vote, but be aware that this question sounds too broad to me. – EBGreen Jun 12 '18 at 19:11
  • I tried my best to prevent that. Very willing to adjust as needed. Distribution and compilation problems may warrant separate posts? – Ifrit Jun 12 '18 at 19:13
  • Can you use containers? – jrtapsell Jun 12 '18 at 19:15
  • 1
    I don't know that seperating the question would matter. I feel that there are a multitude of ways to accomplish your goals. That is just me though and the beauty of community policing is that even if I could vote, just my opinion would not be enough to close the question. – EBGreen Jun 12 '18 at 19:16
  • Sure. Containers are allowed. However they would only be able to talk with the internal network. – Ifrit Jun 12 '18 at 19:18
  • They would be about to be built where dependencies could be downloaded, then pushed where they don't have external access? – jrtapsell Jun 12 '18 at 21:23
  • I'm not concerned about how to bring them in. However, if part of your solution is building externally, then only bringing the built wheels in-network. Sure that'll work. – Ifrit Jun 13 '18 at 02:21
  • 1
    What does _no external access_ mean exactly? Should this be a PyPI mirror that holds a local copy of each package available from pypi.org? Although we don't have hundreds of developers, we are pretty happy with [devpi](https://www.devpi.net/) that holds local copies for all critical packages (like `setuptools`, `pip` or `wheel`) in multiple versions, plus all of the packages that are required by in-house projects. The rest of packages is pulled from pypi.org; `devpi` acts as a proxy in that case. – hoefling Jul 06 '18 at 19:26
  • To maintain the list of dependencies, we have a script that scans requirements of all projects once a day, checks what's not available locally, `pip download`s and `devpi upload`s them. Although we plan to replace it with a plugin for `devpi` in foreseeable future. – hoefling Jul 06 '18 at 19:27
  • As for the wheels with C extensions, you have to set up a building environment for each platform you want to support and organize the building yourself; I'm not aware about any automated tools that can do that in batch. We have a decent support for `manylinux1_x86_64` tag due to easy builds with [official docker image](https://quay.io/repository/pypa/manylinux1_x86_64) and some support for Windows using a dedicated VM for building wheels, but it's by no means automated, although we maintain a set of working examples for building packages in gist form. – hoefling Jul 06 '18 at 19:36
  • No external access means not able to connect to the internet. Internal network only. Some sort of mirroring would be required in order to host the python modules. Your comments seem more like an answer. I recommend relocating as such. – Ifrit Jul 07 '18 at 01:28
  • Maybe I misunderstood you, but it's unclear to me whether the local repo should have access to pypi.org. If not, this means your local repo won't be able to proxy the install requests to pypi.org when the package is not available locally. So it's either 1. The local repo is just an `extra-index`, the devs still install all missing packages from pypi.org without you knowing, or 2. The local repo contains (in a worst case) a full copy of packages found on pypi.org (more than 1kk files) to satisfy the need _Be able to provide any compatible module found in PyPi.org that a developer needs._ – hoefling Jul 07 '18 at 10:24
  • If possible, I'd rather go with _the local mirror can send only a restricted set of request types to pypi.org only, and all the devs use the local mirror as the primary index, with pypi.org being unavailable._ You would then have the full control over what can or can't be installed, blacklisting insecure/outdated/typosquatting package versions etc. – hoefling Jul 07 '18 at 10:25
  • I'm not concerned so much about how your solution pulls in from pypi.org. Whether via proxy or not. This sorta falls into the whole assumption of "magical process" to pull in packages. Saying you can use technology X host a local repo on your own internal server. Cool. So long as this local repo can theoretically support any package found in pypi.org. It's a valid solution. – Ifrit Jul 09 '18 at 14:59
  • I'll split the answer in two, first related to the pypi repo topic, the other one for building custom wheels; the single answer gets too long otherwise. – hoefling Jul 10 '18 at 12:12
  • Are you familiar with the idea of a forward proxy for http/https? – kubanczyk Jul 10 '18 at 21:59
  • Yes I have a basic understanding of them. – Ifrit Jul 11 '18 at 03:10

2 Answers2

5

Preface

Nowadays, there are lots of viable options if you want to host an own PyPI repository. There are many packages available that implement a PyPI repo server, most notable being:

There are also some other, more or less exotic packages like PyPICloud that uploads package files directly to Amazon S3 instance. JFrog's Artifactory also supports serving python packages, although not in free edition afaik so it only makes sense if you're already paying for a license. You can even create a local PyPI repo with using nothing but the python's stdlib, see my answer on SO.

Also, this topic was discussed several times on SO, with most popular questions being How to roll my own pypi? and how to create local own pypi repository index without mirror? Beware that the first question is rather old and contains mostly outdated answers, the second one being more up to date.

devpi

At my work, we evaluated the available solutions two years ago and are sticking with devpi since. Developed by the same people that are behind the popular testing framework pytest and CI tasks automation tool tox, devpi is a versatile tool that:

  • can host multiple repositories (called indexes), allowing you to group package access;
  • acts as a PyPI mirror by default, this can be turned off on demand;
  • provides a role based access control for uploading packages;
  • offers an optional web UI that can be customized via page templating;
  • offers master server replication - all replicas will automatically synchronize the package base from master on changes;
  • can host package documentation (Sphinx);
  • can trigger test run on package upload and display test run results if connected to CI server like Jenkins;
  • has a plugin API for extending on server and CLI client side (based on pluggy library; the same one as used for extending tox or pytest if you're familiar with them); you can customize a lot of stuff by writing your own plugins, from authentication to storage backends. There are also several in-house plugins available on the Github page.

The most powerful feature IMO are the indexes. An index defines a set of packages that can be installed from the index URL. For example, imagine a single devpi instance with two indexes configured: index foo offers package A and index bar offers B. Now you have two repository URLs:

$ pip install A --index-url=https://my.pypi.org/foo

will succeed, but

$ pip install A --index-url=https://my.PyPI.org/bar

will fail. Indexes can inherit each other in the sense of extending own package base, so if bar inherits foo, you will be able to install both A and B from bar index.

This enables us to easily configure a package restriction policy: say, we have two main groups of users (devs and QA), each group having their own set of packages required, also we develop packages offered to customers and tools for internal use. No problem grouping them with indexes:

root/pypi
├── company/base    <- contains common packages like pip or setuptools
│   └── company/internal    <- in-house tools
│       ├── company/dev    <- packages necessary for development
│       │   ├── developer/sandbox    <- private index for single developer
│       │   └── developer2/sandbox
│       └── company/qa    <- packages for QA (test automation etc)
└── customer/release    <- customer packages

Now for example, the dev sets up the index URL https://my.pypi.org/developer/sandbox once and has access to all the new packages uploaded to e.g. company/base, while the customer sets up the index URL https://my.pypi.org/customer/release, not being able to access any packages from company/internal.

The root/pypi is a special meta index: it is always present; if an index inherits it, all requests for installing packages that are not contained in the index are proxied to pypi.org. To turn off the pypi.org mirroring, simply don't inherit from root/pypi.

The upload restriction policy is also easy to set up on per-index basis: all devs can upload to their own private sandboxes and company/dev; all QAs can upload to company/qa; only admin can upload to company/base, uploads to company/internal and the customer indices are made from CI server on successful nightly builds.

Refer to devpi docs for the whole setup and configuration process; the docs are pretty extensive and cover most of the questions that will arise.

hoefling
  • 186
  • 6
  • Thank you. A lot of good info here. The ability to restrict based on groupings appears to be a very powerful and useful feature. – Ifrit Jul 11 '18 at 12:28
1

Custom wheels with compiled C extensions

This is anything other than trivial and can bring a lot of headaches when done wrong. I've never saw a fully or at least partly automated solution for building wheels from source dists, due to different set of dependencies each package has. The difficulty of setting up a building environment also varies depending on platform - setting up docker on a Linux/MacOS machine and running a more or less predefined script is way easier than setting up the Visual C++ compiler and build tools on Windows.

Although we support Windows, we need a lot less precompiled packages for Windows than Linux; most of the precompiled wheels are then installed in jessie-slim containers that simply don't have gcc and stuff available. This way, we don't bloat the containers (main reason for the whole fuzz!). All the building is being done manually, following existing examples. On successful build, the developer is encouraged to copy the terminal log to a gist, enhancing the examples collection.

manylinux1_x86_64/manylinux1_i686

We maintain a list of recipes for statically linked wheels someone has built successfully; the basic approach is always the same:

  1. Run clean quay.io/pypa/manylinux1_x86_64 container
  2. Clone the source code
  3. Install the necessary dependencies
  4. Run bdist_wheel
  5. Repair wheels with auditwheel
  6. Install the wheel
  7. Run the tests; exit container
  8. If build and tests successful, run devpi upload on repaired wheels

Example recipe for mysqlclient, builds wheels for python 3.5 and python 3.6:

$ mkdir io
$ docker pull quay.io/pypa/manylinux1_x86_64
$ docker run --rm -w /root -v $(pwd)/io:/io -it quay.io/pypa/manylinux1_x86_64 /bin/bash
# yum install -y mysql-devel  # mysqlclient needs mysql-devel libs
# git clone https://github.com/PyMySQL/mysqlclient-python.git
# cd mysqlclient-python
# /opt/python/cp35-cp35m/bin/python setup.py bdist_wheel
# /opt/python/cp36-cp36m/bin/python setup.py bdist_wheel
# find dist/ -type f -name "*.whl" | xargs -I {} auditwheel repair {} -w /io
# # start the server for tests
# yum install mysql-server
# chkconfig mysqld on
# service mysqld start
# mysql for CentOS 5 is too old, use utf8 instead of utf8mb4 and hope for the best
# mysql -e 'create database mysqldb_test charset utf8;'
# sed -i '' 's/utf8mb4/utf8/' tests/travis.cnf
# # run the tests with built wheels
# find /io -name "*cp35*" | xargs -I {} /opt/python/cp35-cp35m/bin/python -m pip install {} pytest mock
# TESTDB=travis.cnf /opt/python/cp35-cp35m/bin/python -m pytest
# # same for py3.6
# find /io -name "*cp36*" | xargs -I {} /opt/python/cp36-cp36m/bin/python -m pip install {} pytest mock
# TESTDB=travis.cnf /opt/python/cp36-cp36m/bin/python -m pytest
# exit
$ # check the terminal log for any errors!
$ devpi login admin
$ devpi use https://my.pypi.org/company/base
$ devpi upload --from-dir=io/

Windows

We have set up a Windows VM configured for building wheels, with Visual C++ build tools etc. As with Linux, we have example gists that describe step by step what needs to be done in order to build. However, it is rarely used for building wheels and mostly serves as a Jenkins slave.

hoefling
  • 186
  • 6
  • I was afraid this would be more a manual solution. Thank you for your very in-depth answer(s) – Ifrit Jul 11 '18 at 12:30
  • 1
    Glad to hear that! Another thing worth mentioning is that many open source projects run their CI stuff on Travis, so if you're stuck on a build error, sometimes looking into the `.travis.yml` file helps. Or `appveyor.yml` if you're stuck on a Windows build and the project uses Appveyor for Windows builds. – hoefling Jul 11 '18 at 21:37