1

In ApacheBeam once you have some PCollection input you can do

input.aplly(new ParDo())

however BigQueryIO.read() can be applied only on the Pipeline instance, so my question is how can I make BigQueryIO.read() wait till some other DoFn finishes or produces at least 1 output, should it be a different pipeline where I'll put BigQueryIO or can it be done within the same one?

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230

1 Answers1

2

I don't think it's possible to make BigQueryIO.read() wait for some input since, actually, it creates a PTransform<PBegin, PCollection<T>> where PBegin input type says that it's supposed to be executed in the beginning of your pipeline.

I also don't see any other "read" PTransform's implemented in BigQueryIO connector that would accept any input PCollection.

So, very likely it will be easier run it as a different pipeline and use something like Apache Airflow to orchestrate them.

Alexey Romanenko
  • 1,353
  • 5
  • 11