I am trying read data from a Avro File file stored in HDFS. Now so far I am able to read the entire data by using DataFileReader
or DataFileStream
.
Now I want to implement pagination. Is there any specific way to do it ?
I have already gone through their basic documentations and as per my understanding I think this can be done by using Synchronization Marker. I have tried by :
SeekableInput seekableInput = new AvroFSInput(dataInputStream, 5);
DatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord>();
DataFileReader<GenericRecord> fileReader = new DataFileReader<GenericRecord>(seekableInput, datumReader);
fileReader.seek(startOffset); // set to the start-offset
while (fileReader.hasNext() && !fileReader.pastSync(endOffset)) {
GenericRecord gr = fileReader.next();
System.out.println(gr);
}
But this code giving me a :
Exception in thread "main" org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync!
at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210)
at com.globalids.test.AvroTest.deserializeWithPageing(AvroTest.java:112)
at com.globalids.test.AvroTest.main(AvroTest.java:45)
Caused by: java.io.IOException: Invalid sync!
at org.apache.avro.file.DataFileStream.nextRawBlock(DataFileStream.java:293)
at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:198)
... 2 more
I have also tried setting sync interval during data writing process. Also tried to call sync() method after each record is inserted to the file using DataFileWriter
.
Can anyone point me out what I'm doing wrong ?
Thank you in advance.