I am using below code to read large xml file (in GBs) in hadoop RecordReader using XMLStreamReader
public class RecordReader {
int progressCouunt = 0;
public RecordReader() {
XMLInputFactory factory = XMLInputFactory.newInstance();
FSDataInputStream fdDataInputStream = fs.open(file); //hdfs file
try {
reader = factory.createXMLStreamReader(fdDataInputStream);
} catch (XMLStreamException exception) {
throw new RuntimeException("XMLStreamException exception : ", exception);
}
}
@Override
public float getProgress() throws IOException, InterruptedException {
return progressCouunt;
}
}
My question is how to get reading progress of the file with XMLStreamReader as it does not provide any start or end position to calculate the progress percentage. I have refered to How do I keep track of parsing progress of large files in StAX?, but cannot user filterReader. Please help me here.