0

I need to write GenericRecords into several Avro files with different Schemas.

What are the possible approaches besides AvroIO.writeCustomTypeToGenericRecords() with custom class that expands DynamicAvroDestinations? I followed more or less the same procedure from "Writing data to multiple destinations" paragraph but the documentation says that the .to(DynamicAvroDestinations<UserT,NewDestinationT,OutputT> dynamicDestinations) method is already deprecated suggesting to use FileIO.write() or FileIO.writeDynamic() without any further documentation or code examples.

DiFalco
  • 15
  • 4
  • Let me clarify - you have an input PCollection and you want to output its records into the different Avro files with different schemas? If yes, I expect that you know how determine which schema to use for the current record, right? – Alexey Romanenko Dec 13 '22 at 17:48
  • Yes, technically you can extract it from the GenericRecord itself by calling the getSchema() method. However all the non-AVRO dynamic destination classes expect it to be accessed through the manipulation of the destination string, which could be used, for example, as a key for a sideInput Map or something similar. I haven't found any good code examples for now that use FileIO.writeDynamic() or similar, but fortunately DynamicAvroDestinations still works, even though it is deprecated. – DiFalco Dec 15 '22 at 12:40
  • Did you consider to branch out your input PCollection depending on a current schema of element and write every "branched" PCollection with a proper AvroIO.write() instance ? – Alexey Romanenko Dec 15 '22 at 15:04
  • It could be a possibility. However, even though I might be able to branch my PCollection according to the schema, I would still need to pass it somehow as an argument to the write method. How would that be possible? – DiFalco Dec 19 '22 at 08:13
  • If you already know the schemas then you can pass them to every AvroIO.writeGenericRecords(Schema) instance created in every branch, don't you? – Alexey Romanenko Dec 19 '22 at 12:01
  • not exactly because I have the access to the schema either from the generic record itself or I should access it through i sideInput map both of which are inside the pipeline and hinders me from passing an argument explicitly. – DiFalco Dec 20 '22 at 10:26

0 Answers0