-1

I'm learning to build a machine learning pipeline using the TensorFlow extension and I followed the tutorial and now I like to build my own. I'm getting error when I ingest the data directly from BigQuery. Please advise and thanks in advance!

CODE:

from tfx.components.example_gen.big_query_example_gen.component import BigQueryExampleGen

query = """
    SELECT * FROM `<project_id>.<database>.<table_name>`
"""
example_gen = BigQueryExampleGen(query=query)

ERROR:

RuntimeError: Missing executing project information. Please use the --project command line option to specify it.
LLTeng
  • 385
  • 3
  • 4
  • 15
  • Did you check [this page](https://github.com/tensorflow/tfx/issues/994)? – grnc May 06 '20 at 02:49
  • Thanks for point me to the page. I'm a novice of TFX, ApacheBeam, assuming that the codes in Jupyter Notebook in GCP will be used for Google AI Pipelines / Kubeflow, how do I add the --project and other arguments in the notebook? – LLTeng May 06 '20 at 03:12
  • I'm not sure, but you should this information to the question. – grnc May 06 '20 at 03:23

2 Answers2

1

Due to the params for bigquery client init is not supported even after I've added Google Applicaton Credential, I have worked around to use CsvExampleGen.

LLTeng
  • 385
  • 3
  • 4
  • 15
1

I`m not sure if you solved it already, but to use BigQuery as input you must have the --project-id flag setup like so:

example_gen = components.BigQueryExampleGen(query='SELECT * except(day) FROM `gofind-datalake.data.temp_dist` where rand() < 2800/30713393 limit 3000')
context.run(example_gen, beam_pipeline_args=["--project=gofind-datalake"])
Claudio Davi
  • 154
  • 1
  • 9