17

I am trying to deploy a python application on aws lambda. It has several large python dependencies, the largest being scipy and numpy. The result is that my application is significantly larger than the allowed 250MB.

While trying to find a way to reduce the size, I came across the approach detailed here:

https://github.com/szelenka/shrink-linalg

In essence, when installing using pip, during the scipy & numpy cython compilation, flags can be passed to the c compiler that will leave out the debugging information in the compiled c binaries. The result is that scipy and numpy are reduced to about 50% of the original size. I was able to run this locally (ubuntu 16.04) , and created the binaries without issue. The command used was:

CFLAGS="-g0 -I/usr/include:/usr/local/include -L/usr/lib:/usr/local/lib" pip install numpy scipy --compile --no-cache-dir --global-option=build_ext --global-option="-j 4"

The problem is that in order to run on aws lambda, the binaries must be compiled in a similar environment to the one lambda runs on. An image of the environment can be found here:

https://docs.aws.amazon.com/lambda/latest/dg/current-supported-versions.html

After loading the image on an ec2 instance, I tried to run the same pip installation after installing a few dependencies

sudo yum install python36 python3-devel blas-devel atlas atlas-devel lapack-devel atlas-sse3-devel gcc gcc-64 gcc-gfortran gcc64-gfortran libgfortran, gcc-c++ openblas-devel python36-virtualenv

The numpy is compiling fine, but scipy is not. The cython is not causing any problems, but the fortran compilation is. I am getting the following error:

error: Command "/usr/bin/gfortran -Wall -g -Wall -g -shared build/temp.linux-x86_64-3.6/build/src.linux-x86_64-3.6/scipy/integrate/_test_odeint_bandedmodule.o build/temp.linux-x86_64-3.6/build/src.linux-x86_64-3.6/build/src.linux-x86_64-3.6/scipy/integrate/fortranobject.o build/temp.linux-x86_64-3.6/scipy/integrate/tests/banded5x5.o build/temp.linux-x86_64-3.6/build/src.linux-x86_64-3.6/scipy/integrate/_test_odeint_banded-f2pywrappers.o -L/usr/lib64/atlas -L/usr/lib/gcc/x86_64-amazon-linux/6.4.1 -L/usr/lib/gcc/x86_64-amazon-linux/6.4.1 -L/usr/lib64 -Lbuild/temp.linux-x86_64-3.6 -llsoda -lmach -llapack -lptf77blas -lptcblas -latlas -lptf77blas -lptcblas -lpython3.6m -lgfortran -o build/lib.linux-x86_64-3.6/scipy/integrate/_test_odeint_banded.cpython-36m-x86_64-linux-gnu.so -Wl,--version-script=build/temp.linux-x86_64-3.6/link-version-scipy.integrate._test_odeint_banded.map" failed with exit status 1

I have tried re-installing gfortran as well as the whole gcc collection, but without any luck. Unfortunately, I have very limited experience with fortran compilers. If anyone has any ideas, or has a compiled version of the c binaries, I'd be quite grateful.

albert
  • 8,285
  • 3
  • 19
  • 32
joek575
  • 561
  • 5
  • 9
  • 1
    This is the last (and least informatively) line of the error message... The stuff it tells you first is probably more useful. – DavidW Nov 13 '18 at 07:10
  • Unfortunately, no. Pip's traceback is not very helpful. Before this it just shows all the stuff that compiled successfully – joek575 Nov 13 '18 at 07:17
  • 2
    You might be able to run pip in verbose mode. There almost certainly is more information (even if it's being hidden from you) – DavidW Nov 13 '18 at 08:26
  • Or if there is no verbose mode available, there should be some log file somewhere. – Vladimir F Героям слава Nov 13 '18 at 10:06
  • I do similar things in DOCKER, there is a docker image for amazon-linux and with bit of mounting back and forth you can create decent builds process – Jan Zyka Apr 17 '20 at 12:53

4 Answers4

3

Using the serverless-python-requirements package on Serverless helped me streamline this whole process and reduce the package size as well. Would definitely recommend checking it out.

This is the guide that I followed

Serverless python-requirements plugin

Make sure to leave the strip flag to false to avoid stripping binaries which leads to the problem "ELF load command address/offset not properly aligned",

This is what my final serverless.yml came out to be which gave me the results I wanted to package sklearn + cv2 as a layer:

custom:
  pythonRequirements:
    dockerizePip: true
    useDownloadCache: true
    useStaticCache: false
    slim: true
    strip: false
    layer:
      name: ${self:provider.stage}-cv2-sklearn
      description: Python requirements lambda layer
      compatibleRuntimes:
        - python3.8
      allowedAccounts:
        - '*'
Dharman
  • 30,962
  • 25
  • 85
  • 135
Amir Mousavi
  • 385
  • 4
  • 19
0

Ok, so, i solved this, albeit in a pretty hacky way.

The flags i was passing to pip were meant to reduce the size of the c dependencies, not the fortran dependencies. So there was really no problem using the precompiled fortran dependencies that are normally downloaded via pip.

So first i created a reference version of the unaltered scipy package in a folder sp:

pip install scipy -t sp

I then created a go program to act as a wrapper around the gfortran compiler (or technically around the link to the gfortran compiler in /usr/bin)

package main

import "os"
import "strings"
import "io/ioutil"
import "log"
import "os/exec"
import "fmt"


func checkErr(err error) {
    if err != nil {
        log.Fatal(err)
    }
}


func exists(path string) (bool, error) {
    _, err := os.Stat(path)
    if err == nil { return true, nil }
    if os.IsNotExist(err) { return false, nil }
    return true, err
}


func copyr(src string, dst string) {
    // Read all content of src to data
    data, err := ioutil.ReadFile(src)
    checkErr(err)
    // Write data to dst
    err = ioutil.WriteFile(dst, data, 0644)
    checkErr(err)
}


func main() {

    search_folder := "/home/ec2-user/sp/scipy"
    wrapped_compiler := "/usr/bin/inner_gfortran"
    argsWithProg := os.Args
    noProg := os.Args[1:]
    primed := 0
    check := "-o"
    var (
        cmdOut []byte
        err    error
    )
    for _, el := range argsWithProg {
        if primed == 1{
            primed = 0
            s := strings.Split(el, "scipy")
            if len(s) != 2{
                continue
            }
            src := search_folder + s[1]
            src_exi, _ := exists(src)
            if src_exi == false {
                continue
            }
            primed = 2
            dir_parts := strings.Split(el, "/")
            dir_parts = dir_parts[:len(dir_parts)-1]
            dir := strings.Join(dir_parts,"/")
            exi, _ := exists(dir)
            if exi == false {
                os.MkdirAll(dir, os.ModePerm)
            }
            os.Create(el)
            copyr(src, el)

        }
        if el == check{
            primed = 1
        }
    }
    if primed == 0 {
        if cmdOut, err = exec.Command(wrapped_compiler, noProg...).Output(); err != nil {
            fmt.Fprintln(os.Stderr, "There was an error running fortran compiler: ", err)
            os.Exit(1)
        }
        os.Stdout.Write(cmdOut)
    }

}

Moved the actual compiler to inner_gfortran

sudo mv /usr/bin/gfortran /usr/bin/inner_gfortran

And put the go wrapper in its place

The wrapper will pass most instructions on to the compiler, but if the instruction is to compile a fortran program to scipy, and the compiled binary already exists in my reference version of scipy, the wrapper simply copies the reference version into the new version being compiled.

And that did it. The reduced size versions of scipy and numpy now work on aws lambda for python 3.6.

joek575
  • 561
  • 5
  • 9
0

I know it has been a while since you asked this question, but maybe this will help someone else.

you can use a lambda-like docker container to compile the resources and then copy the libraries back to your dev environment. use these compiled files as the lambda resources

This article helped me a lot: https://medium.com/@mohd.lutfalla/how-to-compile-resources-for-aws-lambda-f46fadc03290

segalle
  • 436
  • 4
  • 15
0

I know that this question is quite old, but now AWS Lambda allows to run the functions from a Docker image stored in the ECR. The size of the Docker image can be up to 10GB.

https://aws.amazon.com/fr/blogs/aws/new-for-aws-lambda-container-image-support/

Sergio Lema
  • 1,491
  • 1
  • 14
  • 25