I'm new to Apache beam, trying to build a simple pipeline that writes PCollection to a bigquery partitioned table using bigqueryio
package.
I find it pretty difficult to get stated with golang&beam SDK, as most of the docs and examples written for Java/Python.
I'm tryin to use this package github.com/apache/beam/sdks/v2/go/pkg/beam/io/xlang/bigqueryio
:
col := createCol(s, line)
outTable := "project:dataset:table$20220808"
bigqueryio.Write(s, outTable, col,
bigqueryio.CreateDisposition(bigqueryio.CreateIfNeeded),
bigqueryio.WriteExpansionAddr(""))
When running this code I get the following error:
tried cross-language for beam:transform:org.apache.beam:schemaio_bigquery_write:v1 against autojava::sdks:java:io:google-cloud-platform:expansion-service:runExpansionService and failed expanding external transform error in starting expansion service, StartService(): context deadline exceeded
Couple of question:
- When running a local pipeline, auto-created expansion service doesn't work? It does work only when I create an expansion service directly and pass the address
- There is another package called
github.com/apache/beam/sdks/v2/go/pkg/beam/io/bigqueryio
but this package doesn't support passing options to theWrite
function, also in the examples they use mostly the first one. Which one should I use? - The
CreateDisposition
only accept two options:CreateNever
andCreateIfNeeded
What aboutWRITE_TRUNCATE
? with that option I used to overwrite exiting table partition, which is what I need to do here. I can see this option in the docs here https://beam.apache.org/releases/pydoc/2.2.0/apache_beam.io.gcp.bigquery.html#apache_beam.io.gcp.bigquery.BigQueryDisposition. I think another critical option is missingWriteDisposition
. Is there a workaround?