1

I am trying to use a python package that is not listed in PyPI with the Google Cloud ML Engine. This package has itself dependencies that even though are listed in PyPI are not installed by default in the ML engine environment, namely the Cython package.

Looking at the documentation it is not really clear how to proceed in this case, I have tried packaging this package in a .tar.gz file and passing it under the --packages argument, but I got the following error:

File "<string>", line 1, in <module> IOError: [Errno 2] No such file or directory: '/tmp/pip-jnm3Ml-build/setup.py'

After I tried using a setup.py file and packaging my code but google cloud ml engine is not able to find the package in dependency_links

Here is my current setup.py:

from setuptools import find_packages, setup

required_packages = ['cython', 'numpy', 'tensorflow', 'scipy', 'cython']
dependency_links = ['git+https://github.com/lucasb-eyer/pydensecrf.git']

setup(name='trainer',
      version='0.1',
      packages=['trainer'],
      install_requires=required_packages,
      dependency_links=dependency_links,
      include_package_data=True,
      description='description')

I would like to avoid doing this by trial and error since sending jobs to the cloud costs money even if they fail immediately.

Thanks in advance.

Miguel Monteiro
  • 389
  • 1
  • 2
  • 16

1 Answers1

2

To do this, you will need to add the Cython to the list of required packages in your setup.py. Instructions can be found here.

Here is a sample setup.py, that would reside in the parent directory of the directory you pass as --package-path to gcloud.

from setuptools import find_packages
from setuptools import setup

REQUIRED_PACKAGES = ['Cython>=0.26']

setup(
    name='trainer',
    version='0.1',
    install_requires=REQUIRED_PACKAGES,
    packages=find_packages(),
    include_package_data=True,
    description='My trainer application package.'
)
rhaertel80
  • 8,254
  • 1
  • 31
  • 47
  • Sorry I wasn't clear enough, the issue is not Cython which is in PyPI, the issue is the package that is not in PyPI, I will update the question with my current setup.py – Miguel Monteiro Jul 21 '17 at 14:48
  • Also I suppose your answer is correct if I package both the custom dependency and my own package, but I am trying not to have to package the dependency just because it seems more correct to do... – Miguel Monteiro Jul 21 '17 at 14:57
  • Never mind, I will do it your way, this dependency link thing seems to be deprecated – Miguel Monteiro Jul 21 '17 at 15:09
  • what do you mean by "dependency link thing". If you have a specific need, I'm happy to help you meet it. – rhaertel80 Jul 21 '17 at 16:54
  • 1
    From what I gathered from searching, using dependency_links as an argument for setup was going to be deprecated (in 2014) and it is not advisable to use, instead you should use PyPI. Since pydensecrf is not in PyPI and has some C++ dependencies I had to send both my package and pygdensecrf to .tar.gz archives and pray that ML engine runs my package's setup first and then pydensecrf (because pydensecrf imports Cython in setup.py) and then pray a bit more that ML engine has gcc installed. In the end it worked, although I am not totally satisfied with the solution... – Miguel Monteiro Jul 21 '17 at 17:09
  • 1
    Glad to hear it works. If you need to guarantee thing happen in a certain order, you can do whatever you need to in setup.py -- it's just a Python file, so you can use it to do just about anything, including downloading files, installing, them, etc. – rhaertel80 Jul 22 '17 at 20:16