4

I'm new to Apache beam, trying to build a simple pipeline that writes PCollection to a bigquery partitioned table using bigqueryio package. I find it pretty difficult to get stated with golang&beam SDK, as most of the docs and examples written for Java/Python.

I'm tryin to use this package github.com/apache/beam/sdks/v2/go/pkg/beam/io/xlang/bigqueryio:

col := createCol(s, line)

outTable := "project:dataset:table$20220808"

bigqueryio.Write(s, outTable, col,
    bigqueryio.CreateDisposition(bigqueryio.CreateIfNeeded),
    bigqueryio.WriteExpansionAddr(""))

When running this code I get the following error:

tried cross-language for beam:transform:org.apache.beam:schemaio_bigquery_write:v1 against autojava::sdks:java:io:google-cloud-platform:expansion-service:runExpansionService and failed expanding external transform error in starting expansion service, StartService(): context deadline exceeded

Couple of question:

  • When running a local pipeline, auto-created expansion service doesn't work? It does work only when I create an expansion service directly and pass the address
  • There is another package called github.com/apache/beam/sdks/v2/go/pkg/beam/io/bigqueryio but this package doesn't support passing options to the Write function, also in the examples they use mostly the first one. Which one should I use?
  • The CreateDisposition only accept two options: CreateNever and CreateIfNeeded What about WRITE_TRUNCATE? with that option I used to overwrite exiting table partition, which is what I need to do here. I can see this option in the docs here https://beam.apache.org/releases/pydoc/2.2.0/apache_beam.io.gcp.bigquery.html#apache_beam.io.gcp.bigquery.BigQueryDisposition. I think another critical option is missing WriteDisposition. Is there a workaround?
Avishay28
  • 2,288
  • 4
  • 25
  • 47
  • You are using cross language BQ IO [(xlang/)](https://github.com/apache/beam/blob/67cb87ecc2d01b88f8620ed6821bcf71376d9849/sdks/go/pkg/beam/io/xlang/bigqueryio/bigquery.go#L78) which allows the SDK to use IOs from other languages.The other one is a normal [big query connector](https://github.com/apache/beam/blob/a9775d757ff766505702a2d769283d6a3030adf5/sdks/go/pkg/beam/io/bigqueryio/bigquery.go#L16) for golang.By looking at the github repos it seems like at this time there is no way we can provide disposition options. – Sayan Bhattacharya Aug 16 '22 at 10:28
  • Also in this [thread](https://stackoverflow.com/questions/56047579/is-there-a-apache-beam-cloud-bigtable-connector-in-golang) it has been mentioned that *At this time, there's been spent little to no time working on/testing the Go SDK IOs, and make no guarantees about the reliability or fit or finish of them*. I think it will be best if you open a issue with apache beam team and get a reposne from them. – Sayan Bhattacharya Aug 16 '22 at 10:29

0 Answers0