4

I have a bunch of protobuff files in GCS and I would like to process them through dataflow (java sdk) and I am not sure how to do that.

Apache beam provides AvroIO to read avro files

 Schema schema = new Schema.Parser().parse(new File("schema.avsc"));
 PCollection<GenericRecord> records =
     p.apply(AvroIO.readGenericRecords(schema)
                .from("gs://my_bucket/path/to/records-*.avro"));

Is there anything similar for reading protobuff files?

Thanks in advance

Kolban
  • 13,794
  • 3
  • 38
  • 60
Pari
  • 1,443
  • 3
  • 19
  • 34
  • Might this be what you are looking for? https://beam.apache.org/releases/javadoc/2.5.0/org/apache/beam/sdk/extensions/protobuf/ProtoCoder.html It seems to have a sample code fragment contained within. – Kolban Oct 17 '18 at 15:11
  • Yes, I need protocoder but it does not work with Text.IO and couldn't get working with any other IO available – Pari Oct 17 '18 at 21:20
  • Can you update the question with what you tried relative to the ProtoCoder class described in the above link. When you say it didn't work, what was the error? – Kolban Oct 18 '18 at 00:22

0 Answers0