0

I am using BatchedColumnProcessor of univocity parser to parse large CSV files. My parser settings are :

csvParserSettings.detectFormatAutomatically();
csvParserSettings.setHeaderExtractionEnabled(true);
csvParserSettings.setMaxCharsPerColumn(-1);
csvParserSettings.setColumnReorderingEnabled(true);
final RecLoCSVBatchedProcessor processor =
    new RecLoCSVBatchedProcessor(batchSize, csvAccountId);

csvParserSettings.setProcessor(processor);

Code snippet where I call parsing is:

 try (InputStream inputStream = new FileInputStream(csvLocalFilePath);
    BOMInputStream bomInputStream = new BOMInputStream(inputStream);
    Reader inputReader = new InputStreamReader(bomInputStream, "UTF-8")) {

    // Rows are processed in batches by RecLoCSVBatchedProcessor
    List<String[]> rows = csvProcessor.parseAll(inputReader);
  
} catch (final IOException e) {
  
}

Sometimes I have noticed the same batch is getting processed twice. In the processor I have overridden batchProcessed call and do the required processing.

public class RecLoCSVBatchedProcessor extends BatchedColumnProcessor {
     public RecLoCSVBatchedProcessor(final int rowsPerBatch, final Long accountId) {
super(rowsPerBatch);
....
}
 @Override
 public void batchProcessed(final int rowsInThisBatch) { ... }

Is this something to do with the settings? As I have mentioned this does not happen always. It is such a waste a resources and unnecessary processing time when same batches get processed multiple times. Please let me know what could be wrong here.

Thanks

mns
  • 329
  • 4
  • 18

0 Answers0