0

tried with scio 0.12.5 both with beamVersion = "2.45.0" and "2.46.0" and scio scio 0.12.8 (beam "2.46.0") and reading from BigQuery (TypeSafe Annotations https://spotify.github.io/scio//io/BigQuery.html#type-annotations ). Java version 11.0.17 (works fine with DirectRunner in local)

object MyApp {
      // reading table
      val table = sc.typedBigQuery[MyTable]()
      table.take(5).debug()

////

  @BigQueryType.fromTable("dataset.myTable")
  class MyTable

setting up in SBT -Dbigquery.project=myProject and in args --tempLocation=gs://myTempBucket.

But when running in DataFlow a java.lang.ClassCastException is thrown:

Error message from worker: java.lang.ClassCastException: class com.MyApp$MyTable  cannot be cast to class org.apache.avro.generic.IndexedRecord (com.MyApp$MyTable and org.apache.avro.generic.IndexedRecord are in unnamed module of loader 'app')
        org.apache.avro.generic.GenericData.getField(GenericData.java:697)
        org.apache.avro.generic.GenericData.getField(GenericData.java:712)
        org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:164)
        org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
        org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
        org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
        org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
        org.apache.beam.sdk.extensions.avro.coders.AvroCoder.encode(AvroCoder.java:378)
        org.apache.beam.sdk.values.TimestampedValue$TimestampedValueCoder.encode(TimestampedValue.java:110)
        org.apache.beam.sdk.values.TimestampedValue$TimestampedValueCoder.encode(TimestampedValue.java:88)
        org.apache.beam.sdk.coders.IterableLikeCoder.encode(IterableLikeCoder.java:113)
        org.apache.beam.sdk.coders.IterableLikeCoder.encode(IterableLikeCoder.java:59)
        org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
        org.apache.beam.sdk.coders.NullableCoder.encode(NullableCoder.java:73)
        org.apache.beam.sdk.coders.NullableCoder.encode(NullableCoder.java:63)
        org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$CheckpointCoder.encode(UnboundedReadFromBoundedSource.java:236)
        org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$CheckpointCoder.encode(UnboundedReadFromBoundedSource.java:215)
        org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
        org.apache.beam.runners.dataflow.worker.StreamingModeExecutionContext.flushState(StreamingModeExecutionContext.java:435)
        org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1406)

Other considerations:

  1. The app (Streaming) end to end has been tested reading from PubSub, Kafka and sinking in GCS, BigQuery and PubSub but this problem arises just when I start using a new feature to read (as a side input in the future) a Type safe BigQuery Table.
  2. The app is launched by a flex template
  3. Pulling an image from Docker, deployed by sbt -v -Dsbt.insecureprotocol=true -Dsbt.override.build.repos=true -Dsbt.color=false -Dbigquery.project=${PROJECT_ID} clean pack and docker build -t endpoint/app_name:version --build-arg , etc

I do not know if it is an issue resolving the schema with macros (they are resolved by the compiler step and I am missing a setup when launching the app?). Besides, MyApp.MyTable.class (also tried with MyTable.MyTable.class) is properly generated in sbt clean pack step, but it does get the values of MyTable as we can see here:

"java.lang.ClassCastException: value MyTable(E2F3C44E4858A3210A60FC65,999,1,2009-06-08T13:06:44.188829000000,2009-06-08,9999-12-31,ES,19,30,30,4001020147,0,0,0,30,2,0,N,N,N,N,N,0,0,0,0,0,0,0,0,0,268,0,EUR,N,N,N,0,0,0,900,0,1,0,0,0,00100000VISC,1,0,0,0001-01-01,0,BANK, ,2020-04-28T18:14:33.395Z,PPAE938 ,6009    ,ES,19,0,2023-05-09T09:13:15.960Z,RR,Krtable) (a com.MyApp$MyTable) **cannot be cast to expected type root**
    at org.apache.avro.path.TracingClassCastException.summarize(TracingClassCastException.java:79)
    at org.apache.avro.path.TracingClassCastException.summarize(TracingClassCastException.java:30)
    at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:84)
    at org.apache.beam.sdk.coders.AvroCoder.encode(AvroCoder.java:373)
    at org.apache.beam.sdk.values.TimestampedValue$TimestampedValueCoder.encode(TimestampedValue.java:110)
    at org.apache.beam.sdk.values.TimestampedValue$TimestampedValueCoder.encode(TimestampedValue.java:88)
    at org.apache.beam.sdk.coders.IterableLikeCoder.encode(IterableLikeCoder.java:113)
    at org.apache.beam.sdk.coders.IterableLikeCoder.encode(IterableLikeCoder.java:59)
    at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
    at org.apache.beam.sdk.coders.NullableCoder.encode(NullableCoder.java:73)
    at org.apache.beam.sdk.coders.NullableCoder.encode(NullableCoder.java:63)
    at org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$CheckpointCoder.encode(UnboundedReadFromBoundedSource.java:236)
    at org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$CheckpointCoder.encode(UnboundedReadFromBoundedSource.java:215)
    at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
    at org.apache.beam.runners.dataflow.worker.StreamingModeExecutionContext.flushState(StreamingModeExecutionContext.java:435)
    at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1516)
    at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$800(StreamingDataflowWorker.java:167)
    at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$4.run(StreamingDataflowWorker.java:1139)
    at org.apache.beam.runners.dataflow.worker.util.BoundedQueueExecutor.lambda$executeLockHeld$0(BoundedQueueExecutor.java:133)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)

Thanks in advance!

P.S.1: The interim - temp "myTable.avro" (that SCIO assemblies in GCS when using @BigQueryType.fromTable or similar) looks correct:

INFO 2023-05-17T10:31:42.133Z Matched 1 files for pattern gs://my-bucket-staging/temp/BigQueryExtractTemp/4653ca8e8ff9464683ba07414439d81a/000000000000.avro

with exactly the same schema generated with both the DirectRunner in local and the DataFlowRunner

{
  "type" : "record",
  "name" : "Root",
  "fields" : [ {
    "name" : "id",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "bank",
    "type" : [ "null", "long" ],
    "doc" : "Entity code",
    "default" : null
  }, {

....

P.S.2: Versioning workarounds:

  • Version of scio and beam were increased with the same result.
  • Excluded avro version from other jars (it does use 1.8.2 which is the one linked to the SCIO version)
  • sbt.version=1.8.2
  • tried also packing java 8 (JAVA_VERSION: 1.8.0_361 / JAVA_VERSION_SEM: 8.0.361) instead of java 11

P.S.3: Same error by doing (works just fine in local):


sc.typedBigQuery[MyTableRow](Table.Spec(s"myTable"))

  @BigQueryType.toTable
  case class MyTableRow(
    id: String,
    bank: Long,


===> CODE: MinimalPubSubBQLookUp <===

0 Answers0