I am writing a python application which dependents on Scrapy
module. It works fine locally but failed when I run it from aws lambda test console. My python project has a requirements.txt
file with below dependency:
scrapy==1.6.0
I packaged all dependencies by following this link: https://docs.aws.amazon.com/lambda/latest/dg/lambda-python-how-to-create-deployment-package.html. And also, I put my source code *.py
at the root level of in the zip file. My package script can be found https://github.com/zhaoyi0113/quote-datalake/blob/master/bin/deploy.sh
.
It basically does two things, first run command pip install -r requirements.txt -t dist
to download all dependencies to dist
directory. second, copy app python source code to dist
directory.
The deployment is done via terraform
and below is the configuration file.
provider "aws" {
profile = "default"
region = "ap-southeast-2"
}
variable "runtime" {
default = "python3.6"
}
data "archive_file" "zipit" {
type = "zip"
source_dir = "crawler/dist"
output_path = "crawler/dist/deploy.zip"
}
resource "aws_lambda_function" "test_lambda" {
filename = "crawler/dist/deploy.zip"
function_name = "quote-crawler"
role = "arn:aws:iam::773592622512:role/LambdaRole"
handler = "handler.handler"
source_code_hash = "${data.archive_file.zipit.output_base64sha256}"
runtime = "${var.runtime}"
}
It zip the directory and upload the file to lambda.
I found I get the runtime error in lambda Unable to import module 'handler': cannot import name 'etree'
when there is a statement import scrapy
. I didn't use etree
in my code so I believe there is something used by scrapy
.
My source code can be found at https://github.com/zhaoyi0113/quote-datalake/tree/master/crawler
. There are only two simple python files.
It works fine if I run them locally. The error only appears in lambda. Is there a different way to package scrapy
to lambda?