Pandas on OpenShift v3

Question

Now that OpenShift Online V2 has announced its end of service, I am looking to migrate my Python application to OpenShift Online V3, aka OpenShift NextGen. Pandas is a requirement (and listed in requirements.txt)

It already has been non-trivial to get pandas installed in v2 but V3 does not allow manual interaction in the build process (or does it?).

When I try to build my app the build process stops after an hour. pip has downloaded and installed the contents of the requirements.txt and is running setup.py for selected packages. The and of the log file is

Running setup.py install for numpy
Running setup.py install for Bottleneck
Running setup.py install for numexpr
Running setup.py install for pandas

Then the process stops without any error message.

Does anyone have a clue how to build Python applications that require pandas on OpenShift V3?

How many different Python packages are you trying to install from the requirements.txt file? Even when recreating same list of packages that Jupyter datascience-notebook has, have never seen it take that long to build an image. Which Online environment are you? — Graham Dumpleton, Sep 04 '17 at 05:44
`requirements.txt` contains 69 lines. `pip` runs quickly and stops with running `setup.py` for pandas before the build fails after 1h or so. I am using OpenShift Online 3 Starter to get a feeling on how big the porting effort will be. — Fabian, Sep 04 '17 at 06:54
Which particular Starter tier instance are you on, us-east-1, us-west-1 or other? Some have been slow at times. — Graham Dumpleton, Sep 04 '17 at 10:35
It's US West (Origon). From the urls I believe it's `starter-us-west-2`. — Fabian, Sep 04 '17 at 10:38
Is your repo public? I am not on us-west-2, but if I can find time I can perhaps try it on us-east-1 and see if can spot why. Am travelling over the next week, so can't guarantee anything though. — Graham Dumpleton, Sep 04 '17 at 10:40
Sorry, it's not. But I created a repository with just the requirements.txt. It cannot run but it should build. Thanks very much! Here's the link: https://github.com/fsbraun/osv3test.git — Fabian, Sep 04 '17 at 10:52
Drop ``wsgi/application`` from the ``app.sh`` and it will still be able to run up server with default splash page. :-) — Graham Dumpleton, Sep 04 '17 at 11:26
Confirmed that same issue happens on us-east-1 and no useful logging or events generated. I don't have issue my own computer using ``oc cluster up`` environment. — Graham Dumpleton, Sep 04 '17 at 11:59

score 7 · Accepted Answer · answered Sep 04 '17 at 12:46

It is going to be one of two things.

Either compiling Pandas is a huge memory hog, possibly caused by the compiler hitting some pathological case. Or, the size of the generated image at that point exceeds an internal limit and so runs out of disk space allocated.

If it was memory, you would need to increase the memory allocated to the build pod. By default in Online this is 512Mi.

To increase the limit you will need to edit the YAML/JSON for the build configuration from the web console, or from the command line using oc edit.

For YAML, you need to add the following:

  resources:
    limits:
      memory: 1Gi

This is setting the field:

$ oc explain bc.spec.resources.limits FIELD: limits <object>

DESCRIPTION:
     Limits describes the maximum amount of compute resources allowed. More
     info: http://kubernetes.io/docs/user-guide/compute-resources/

The maximum is 1Gi. It appears an increase to this value does allow the build to complete, where as increasing it to 768Mi wasn't enough.

Do be aware that this takes memory away from the quota for compute-resources-timebound when running and since it is using it all during the build, other things you try and do at the same time could be held up.

FWIW, the image size on a local build, not in Online, only produced:

172.30.1.1:5000/mysite/osv3test              latest               f323d9b036f6        About an hour ago   910MB

Thus unless intermediary space used before things were cleaned up was an issue, it isn't an issue.

So increasing memory used for the build appears to be the answer.

Awesome! Thanks very much, Graham, for going all the way! It is the memory limit (and not the image size). The solution for v2 was manually removing the optimization level for gcc. I guess, the compiler needs huge memory resources to optimize these large computer-generated c files which come with pandas. — Fabian, Sep 04 '17 at 13:09
If there is a binary wheel for the package, you might also try adding a file ``.s2i/environment`` and in it add ``UPGRADE_PIP_TO_LATEST=1``. By default the latest ``pip`` version isn't used and the old version has some problems with binary wheels. So is possible that if there is a wheel, the newer version of ``pip`` may use it and so avoid needing to compile it from source code. — Graham Dumpleton, Sep 09 '17 at 00:47

Pandas on OpenShift v3

1 Answers1

Linked