42

I would like to set up a Python function I've written on AWS Lambda, a function that depends on a bunch of Python libraries I have already collected in a conda environment.

To set this up on Lambda, I'm supposed to zip this environment up, but the Lambda docs only give instructions for how to do this using pip/VirtualEnv. Does anyone have experience with this?

RoyalTS
  • 9,545
  • 12
  • 60
  • 101

4 Answers4

15

You should use the serverless framework in combination with the serverless-python-requirements plugin. You just need a requirements.txt and the plugin automatically packages your code and the dependencies in a zip-file, uploads everything to s3 and deploys your function. Bonus: Since it can do this dockerized, it is also able to help you with packages that need binary dependencies.

Have a look here (https://serverless.com/blog/serverless-python-packaging/) for a how-to.

From experience I strongly recommend you look into that. Every bit of manual labour for deployment and such is something that keeps you from developing your logic.

Edit 2017-12-17:

Your comment makes sense @eelco-hoogendoorn.

However, in my mind a conda environment is just an encapsulated place where a bunch of python packages live. So, if you would put all these dependencies (from your conda env) into a requirements.txt (and use serverless + plugin) that would solve your problem, no?
IMHO it would essentially be the same as zipping all the packages you installed in your env into your deployment package. That being said, here is a snippet, which does essentially this:

conda env export --name Name_of_your_Conda_env | yq -r '.dependencies[] | .. | select(type == "string")' | sed -E "s/(^[^=]*)(=+)([0-9.]+)(=.*|$)/\1==\3/" > requirements.txt

Unfortunately conda env export only exports the environment in yaml format. The --json flag doesn't work right now, but is supposed to be fixed in the next release. That is why I had to use yq instead of jq. You can install yq using pip install yq. It is just a wrapper around jq to allow it to also work with yaml files.

KEEP IN MIND

Lambda deployment code can only be 50MB in size. Your environment shouldn't be too big.

I have not tried deploying a lambda with serverless + serverless-python-packaging and a requirements.txt created like that and I don't know if it will work.

DrEigelb
  • 588
  • 4
  • 8
  • 1
    thank you very much! It's good to know that so much boring things are already automated. – newtover Dec 15 '17 at 22:16
  • 2
    I like serverless, but it does not address my questions about conda – Eelco Hoogendoorn Dec 17 '17 at 15:20
  • To make a requirements.txt file from a conda environment with conda (and/or pip) installed packages you can just use `pip freeze > requirements.txt`. If you have private conda packages that aren't on pypi then of course this file won't be sufficient to install those. – Avi Dec 20 '18 at 17:28
  • Whats the way then to use non-pypi packages (such as plotly-orca) in AWS Lambda. Before using plotly, i simply use `serverless-python-requirements` plugin with serverless framework, to package mynump-pandas dependencies for AWS lambda. But now as i am using plotly, so to use plotly-orca package, which is not available at 'pypi' i.e. not accessible through pip. So now how can i proceed? NOTE: using conda environment, and packaging its dependencies are not possible as conda python is different from python in AWS Lambda, that's why also i used serverless. @Avi @DrEigelb – 7bStan Apr 07 '20 at 16:25
  • I just tried the `--json` flag and it appears to work now. – bjd2385 Aug 07 '20 at 21:34
  • --json now works so: ``` conda env export --name heeq --json | jq '.dependencies[]' | cut -d '=' -f 1,2 | sed 's/=/==/' | tr -d '"' > requirements.txt ``` – milan Apr 25 '21 at 10:06
6

The main reason why I use conda is an option not to compile different binary packages myself (like numpy, matplotlib, pyqt, etc.) or compile them less frequently. When you do need to compile something yourself for the specific version of python (like uwsgi), you should compile the binaries with the same gcc version that the python within your conda environment is compiled with - most probably it is not the same gcc that your OS is using, since conda is now using the latest versions of the gcc that should be installed with conda install gxx_linux-64.

This leads us to two situations:

  1. All you dependencies are in pure python and you can actually save a list of list of them using pip freeze and bundle them as it is stated for virtualenv.

  2. You have some binary extensions. In that case, the the binaries from your conda environment will not work with the python used by AWS lambda. Unfortunately, you will need to visit the page describing the execution environment (AMI: amzn-ami-hvm-2017.03.1.20170812-x86_64-gp2), set up the environment, build the binaries for the specific version of built-in python in a separate directory (as well as pure python packages), and then bundle them into a zip-archive.

This is a general answer to your question, but the main idea is that you can not reuse your binary packages, only a list of them.

newtover
  • 31,286
  • 11
  • 84
  • 89
  • For my use case, I want to be able to run both conda packages from public channels such as conda-forge and anaconda, as well as privately built conda packages that contain binaries built using the same gcc 4.8 conda package as used by conda-forge; all python 3.6. I cant find the gcc version used to build the python used by AWS. But are you saying that even if I matched my gcc version to the AWS gcc version for my prebuilt conda packages, it still wouldnt work? – Eelco Hoogendoorn Dec 14 '17 at 12:24
  • @EelcoHoogendoorn, they will work only if the built-in python on the AWS Lambda instance is compiled with the same version of gcc optimizer as your conda python. If not, your binaries will not work with the built-in AWS Lambda python. – newtover Dec 14 '17 at 15:26
  • Is it not true that what really matters is the c standard libs used by gcc and not the version of gcc itself? – ThisGuyCantEven Feb 05 '19 at 18:05
2

I can't think of a good reason why zipping up your conda environment wouldn't work.

I thik you can go into your anaconda2/envs/ or anaconda3/envs/ directory and copy/zip the env-directory you want to upload. Conda is just a souped-up version of a virtualenv, plus a different & somewhat optional package-manager. The big reason I think it's ok is that conda environments encapsulate all their dependencies within their particular .../anaconda[2|3]/envs/$VIRTUAL_ENV_DIR/ directories by default.

Using the normal virtualenv expression gives you a bit more freedom, in sort of the same way that cavemen had more freedom than modern people. Personally I prefer cars. With virtualenv you basically get a semi-empty $PYTHON_PATH variable that you can fill with whatever you want, rather than the more robust, pre-populated env that Conda spits out. The following is a good table for reference: https://conda.io/docs/commands.html#conda-vs-pip-vs-virtualenv-commands

Conda turns the command ~$ /path/to/$VIRTUAL_ENV_ROOT_DIR/bin/activate into ~$ source activate $VIRTUAL_ENV_NAME

Say you want to make a virtualenv the old fashioned way. You'd choose a directory (let's call it $VIRTUAL_ENV_ROOT_DIR,) & name (which we'll call $VIRTUAL_ENV_NAME.) At this point you would type:

~$ cd $VIRTUAL_ENV_ROOT_DIR && virtualenv $VIRTUAL_ENV_NAME

python then creates a copy of it's own interpreter library (plus pip and setuptools I think) & places an executable called activate in this clone's bin/ directory. The $VIRTUAL_ENV_ROOT_DIR/bin/activate script works by changing your current $PYTHONPATH environment variable, which determines what python interpreter gets called when you type ~$ python into the shell, & also the list of directories containing all modules which the interpreter will see when it is told to import something. This is the primary reason you'll see #!/usr/bin/env python in people's code instead of /usr/bin/python.

Community
  • 1
  • 1
Rob Truxal
  • 5,856
  • 4
  • 22
  • 39
  • 3
    The information here is informative and relevant, but it doesn't touch on the AWS aspect that the user is asking about. – Steve Buzonas Nov 03 '17 at 17:18
  • How do you think I should modify it? (I'd check the fix myself but I've stopped paying for AWS & started using Azure :L) – Rob Truxal Nov 03 '17 at 22:43
  • 2
    I got here because I have the same question regarding Lambda, but this doesn't quite get there. The main detail needed in the case of working on Lambda is, how does the file structure differ between `conda` and `virtualenv`? Is the whole `conda` env directory required? Why? Packaging `virtualenv` for Lambda deployment is a zip of your source code with `site-packages` and `dist-packages` contents in the source root so they are discovered on the python path. – Steve Buzonas Nov 04 '17 at 13:42
  • @SteveBuzonas, is that all a lambda-conformant `virtualenv` requires? If so, `conda` doesn't change the structure of either `site-packages` or `dist-packages` when it creates a new environment. What it does do is register a command in your primary `$PATH` to activate the virtual-env...it's just an alias, but that could cause issues maybe... Unfortunately @ this point I'm just speculating about what Lambda wants & not in a position to test. I'd welcome any edits you want to make to my answer though! – Rob Truxal Nov 11 '17 at 21:42
  • 3
    Zipping the whole env could may work but its not good idea as it contains the full python installation for example. – Jan Zyka Jan 24 '18 at 09:54
  • 1
    One reason this might not work is that packages that compile C code as part of installation may compile a different binary on your machine than is needed for lambda. The lambdas run on an EC2 instance with aws-linux. Unsure about Mac and other versions of Linux, but if OP is developing on Windows and is using (e.g.) `numpy`, zipping the conda env will still not work. – Adam Hoelscher Sep 05 '18 at 22:28
  • From what I can tell, this does not work for two reasons. First, If you simply zip up the virtual env folder, it's far too large to be uploaded as a Lambda function with zip packaging. Zip files must be 50MB in size or less. Second, any Python dependencies that require native libraries must be deployed as a Docker container. Documentation is here: https://docs.aws.amazon.com/lambda/latest/dg/python-package.html. The AWS docs recommend using SAM, but that also poses a problem in that it seems to only use PIP for managing packages. – Nick A. Watts Jul 12 '21 at 13:50
1

In https://github.com/dazza-codes/aws-lambda-layer-packing, the pip wheels seem to be working for many packages (pure-pip installs). It is difficult to bundle a lot of packages into a compact AWS Lambda layer, since pip wheels do not use shared libraries and tend to get bloated a bit, but they work. Based on some discussions in github, the conda vs. pip challenges are not trivial:

Darren Weber
  • 1,537
  • 19
  • 20