0

I have to work on processing a large CSV file (~1GB) looks as below using java.

Trans1, 1, 2, 3, 4
Trans1, 2, 3, 4, 5
Trans1, 4, 5, 2, 1
Trans2, 1, 2, 3, 4
Trans2, 2, 3, 4, 5
Trans2, 4, 5, 2, 1
Trans2, 1, 2, 3, 4
Trans3, 2, 3, 4, 5
Trans3, 4, 5, 2, 1

The first 3 lines belong to one transaction, next 4 one transaction. I have to read a batch of transactions may be 1000 at a time. When I read the file it should end at the last line of that transaction.

what is the best way in doing this using java consider the best performance?

Dont want to load the entire file into memory to avoid any performance issues.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
PraveenM
  • 23
  • 7
  • I think `Java-8` streams should be helpful here. I have not done any practicals with this yet like file-reading but I think that should solve your problem. – Mritunjay Apr 11 '19 at 04:45

1 Answers1

0

Assuming you would want to store each transaction itself in memory, to do some processing after reading it completely, you could try something along these lines:

StringBuilder sb = new StringBuilder();
int trans = -1;

try (BufferedReader br = Files.newBufferedReader(Paths.get("transactions.csv"))) {
    String line;
    while ((line = br.readLine()) != null) {
        String[] parts = line.split(",\\s*");
        int transCurr = Integer.parseInt(parts[0].replace("Trans", ""));
        if (transCurr != trans && trans != -1) {
            // process the transaction just read in
            sb = new StringBuilder();
        }
        trans = transCurr;
        sb.append(line).append("\n");
    }

}
catch (IOException e) {
    System.err.format("IOException: %s%n", e);
}

If you want to instead process each line as it comes in, then we can easily modify the above code to do that. Each data point would be available in parts[] for use.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Thanks for your reply Tim. I am looking at reading a bulk of transactions to process them as a small branches – PraveenM Apr 11 '19 at 04:54
  • Then my answer does this for you, more or less. You may have to play around with the code a bit. – Tim Biegeleisen Apr 11 '19 at 04:54
  • I am aware of reading line by line using buffered reader and the streams/scanner. But looking mainly for inputs on the alternative ways to process it as batch. i cant find much over internet as well – PraveenM Apr 11 '19 at 04:58
  • The only memory footprint my answer has is at most the text information for a single transaction. We can reduce this even further, if you can provide logic for how to provide each line on the fly, as it is read in. – Tim Biegeleisen Apr 11 '19 at 04:59