0

I'm writing some data using flink AvroOutputFormat,

val source: DataSet[Row] = environment.createInput(inputBuilder.finish)
val tableEnv: BatchTableEnvironment = new BatchTableEnvironment(environment, TableConfig.DEFAULT)
val table: Table = source.toTable(tableEnv)
val avroOutputFormat = new AvroOutputFormat[Row](classOf[Row])
avroOutputFormat.setCodec(AvroOutputFormat.Codec.NULL)
source.write(avroOutputFormat, "/Users/x/Documents/test_1.avro").setParallelism(1)
environment.execute()

This writes data into a file called test_1.avro. When I tried to read the file as,

val users = new AvroInputFormat[Row](new Path("/Users/x/Documents/test_1.avro"), classOf[Row])
val usersDS = environment.createInput(users)
usersDS.print()

This prints the row as,

java.lang.Object@4462efe1,java.lang.Object@7c3e4b1a,java.lang.Object@2db4ad1,java.lang.Object@765d55d5,java.lang.Object@2513a118,java.lang.Object@2bfb583b,java.lang.Object@73ae0257,java.lang.Object@6fc1020a,java.lang.Object@5762658b

Is there a possible way to print this data values instead of object addresses.

TobiSH
  • 2,833
  • 3
  • 23
  • 33
codebot
  • 2,540
  • 3
  • 38
  • 89

1 Answers1

1

You are mixing Table API and Datastream API in a weird fashion. It would be best to stick to one API or use the proper conversion methods.

As is you are basically not letting Flink know the expected input/output schema. classOf[Row] is everything and nothing.

To write a table to Avro file, please use the table connector. Basic sketch

tableEnv.connect(new FileSystem("/path/to/file"))
    .withFormat(new Avro().avroSchema("...")) // <- Adjust
    .withSchema(schema)
    .createTemporaryTable("AvroSinkTable")
table.insertInto("AvroSinkTable")

edit: as of now, Filesystem connector unfortunately does not support Avro.

So there is no option but to use dataset API. I recommend to use avrohugger to generate an appropriate scala class for your avro schema.

// convert to your scala class
val dsTuple: DataSet[User] = tableEnv.toDataSet[User](table)
// write out
val avroOutputFormat = new AvroOutputFormat<>(User.class)
avroOutputFormat.setCodec(Codec.SNAPPY)
avroOutputFormat.setSchema(User.SCHEMA$)
specificUser.write(avroOutputFormat, outputPath1)
Arvid Heise
  • 3,524
  • 5
  • 11
  • I'm done with the code. But now I'm getting `Exception in thread "main" org.apache.flink.table.api.NoMatchingTableFactoryException: Could not find a suitable table factory for 'org.apache.flink.table.factories.BatchTableSourceFactory' in the classpath. Reason: No context matches.` @Arvid – codebot Feb 24 '20 at 11:31