1

I want to be able to use both Pyspark and AutoGluon libraries in a notebook backed by an EMR cluster. I have tried to install AutoGluon using the bootstrap script for the EMR cluster (emr-5.30.1) with the following sudo python3 -m pip install autogluon , but it fails with

    Running setup.py install for ConfigSpace: finished with status 'error'
    Complete output from command /bin/python3 -u -c "import setuptools, tokenize;__file__='/mnt/tmp/pip-build-1yey/ConfigSpace/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-n_888-record/install-record.txt --single-version-externally-managed --compile:
    ...
    ...
     ConfigSpace/hyperparameters.c:4:10: fatal error: Python.h: No such file or directory
     #include "Python.h"
              ^~~~~~~~~~
    compilation terminated.
    error: command 'gcc' failed with exit status 1

mxnet also installed using bootstrap script has Version: 1.6.0.(cannot upgrade to a higher version - No matching distribution found for mxnet==1.7.0 )

Is there any way I can get autogluon work with an EMR cluster?

Thal
  • 93
  • 2
  • 7

1 Answers1

0

You can try to install it within the notebook or on cluster is running. you can use install_pypi_package on spark context.

For example

 sc.install_pypi_package("autogluon") 
A.B
  • 20,110
  • 3
  • 37
  • 71
  • Already tried that - Exception ``` AttributeError: type object 'StopIteration' has no attribute 'co_names' ---------------------------------------- Command "python setup.py egg_info" failed with error code 1 in /mnt/tmp/pip-build-cpkxvh1n/fastparquet/ ``` – Thal Sep 15 '20 at 21:58