I have a Google App Engine triggering a Cloud DataFlow pipeline. This pipeline is supposed to write the final PCollection to Google BigQuery, but I can't find a way to install the right apache_beam.io dependency.
I'm running Apache Beam version 2.2.0 locally.
The project structure follows the code from this blog post.
This is the relevant piece of code:
"WriteToBigQuery" >> beam.io.WriteToBigQuery(
("%s:%s.%s" % (PROJECT, DATASET, TABLE)),
schema=TABLE_SCHEMA,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
)
When I run this code locally, the beam.io.WriteToBigQuery()
is called correctly. It is fetched from the apache_beam/io/gcp/bigquery.py
from my virtual environment.
But I can't install this dependency on my lib
folder that is shipped with the app on deploy.
Even though I have a requirements file containing apache-beam[gcp]==2.2.0
as a requirement, when I run pip install -r requirements.txt -t lib
, the apache_beam/io/gcp/bigquery.py
that is downloaded to my lib
folder does not contain the class WriteToBigQuery
, and then I get the error 'module' object has no attribute 'WriteToBigQuery'
when running the app on Google App Engine.
Does anyone have any idea on how I can get the right bigquery.py
?