Issue on Tensorflow Extension BigQuery as ExampleGen

Question

I'm learning to build a machine learning pipeline using the TensorFlow extension and I followed the tutorial and now I like to build my own. I'm getting error when I ingest the data directly from BigQuery. Please advise and thanks in advance!

CODE:

from tfx.components.example_gen.big_query_example_gen.component import BigQueryExampleGen

query = """
    SELECT * FROM `<project_id>.<database>.<table_name>`
"""
example_gen = BigQueryExampleGen(query=query)

ERROR:

RuntimeError: Missing executing project information. Please use the --project command line option to specify it.

Did you check [this page](https://github.com/tensorflow/tfx/issues/994)? — grnc, May 06 '20 at 02:49
Thanks for point me to the page. I'm a novice of TFX, ApacheBeam, assuming that the codes in Jupyter Notebook in GCP will be used for Google AI Pipelines / Kubeflow, how do I add the --project and other arguments in the notebook? — LLTeng, May 06 '20 at 03:12
I'm not sure, but you should this information to the question. — grnc, May 06 '20 at 03:23

score 1 · Answer 1 · answered May 06 '20 at 03:51

1

Due to the params for bigquery client init is not supported even after I've added Google Applicaton Credential, I have worked around to use CsvExampleGen.

answered May 06 '20 at 03:51

LLTeng

385
3
4
15

score 1 · Accepted Answer · answered May 22 '20 at 18:15

I`m not sure if you solved it already, but to use BigQuery as input you must have the --project-id flag setup like so:

example_gen = components.BigQueryExampleGen(query='SELECT * except(day) FROM `gofind-datalake.data.temp_dist` where rand() < 2800/30713393 limit 3000')
context.run(example_gen, beam_pipeline_args=["--project=gofind-datalake"])

Issue on Tensorflow Extension BigQuery as ExampleGen

2 Answers2