0

Is there anyway to write Dataset object to ORC file? I know a Dataset object can be written as avro file by using AvroOutputFormat, but looks like there is no equivalent class for orc?

If that can not be achieved, is there any way to convert Dataset to Table or DataStream?

The reason I am asking is that I have to use Dataset API since it supports reading multiple files. Like this

AvroInputFormat<MyType> avroInputFormat = new AvroInputFormat<>(....
avroInputFormat.setFilePaths(<file paths list>)
DataSet<MyType> dataset = env.createInput(avroInputFormat);

this will work. However, if I use DataStream API it will throw exception as

Caused by: java.lang.IllegalArgumentException: FileInputFormats with multiple paths are not supported yet.

Any suggestions will be greatly appreciated. Thanks!

tottistar
  • 11
  • 3

1 Answers1

0

Flink's DataSet API is deprecated. You should use either the DataStream API in Batch mode or the Table API in batch mode. If you have all your files in one folder, you can provide the path to that folder as input and then both will read all the files in there. In case you have different paths for files, it's best to create a Jira ticket with a feature request for that.

Martijn Visser
  • 1,468
  • 1
  • 3
  • 9
  • Thanks, I have files in different paths. But I can do the union of those datastreams of individual folder. – tottistar Oct 27 '22 at 14:03