2

I am reading CSV list using univocity parser - https://www.univocity.com/pages/parsers-tutorial. Below is how test.csv looks like

Active;3189;Active on this date 2015-03-15-17.03.06.000000

Catalog;3189;This is for date 2015-04-21-11.04.11.000000

Master;3190;It happens on this date 2016-04-22-09.04.27.000000

InActive;3190;Inactive on this date 2016-04-23-09.04.46.000000

Below code do a parsing -

List<String[]> allRows = parser.parseAll(new FileReader("E:/test.csv"));

How can I compare rows one by one after parsing and concatenate based on 2nd column uniqueness

O/p

for 3189 records - String x = Active on this date 2016-03-15-17.03.06.000000 and This is for date 2015-04-21-11.04.11.000000

for 3190 records String x = It happens on this date 2016-04-22-09.04.27.000000 and Inactive on this date 2016-04-23-09.04.46.000000

Sks
  • 612
  • 7
  • 23
  • I can think of some dirty approach (not a good design!): You can create two different lists for `Active` and `Inactive` values and compare them based on `id` (i.e. in this case 3189 or 3190). If the comparison matches then concatenate the String values. – Procrastinator Sep 27 '17 at 12:31
  • Appreciated your response. First column is dynamic it can be any string other than active or Inactive. We have to make decision on 2nd column instead of first column value. Updating question too – Sks Sep 27 '17 at 13:36

3 Answers3

2

This is an example you have to be more careful with the exceptions may occur, so you can do something like this:

String pattern = "^(Active|Inactive);([^;]*);(.*)$";
Pattern r = Pattern.compile(pattern);
for (String[] row : allRows) {
    if (row[0].matches(pattern)) {
        Matcher m = r.matcher(row[0]);
        if (m.find()) {
            Record record = records.get(m.group(2)) == null ? new Record() : records.get(m.group(2));
            record.setId(m.group(2));
            if (m.group(1).equals("Active")) {
                record.setActiveComment(m.group(3));
            } else if (m.group(1).equals("Inactive")) {
                record.setInactiveComment(m.group(3));
            }
            records.put(record.getId(), record);
        } else {
            System.out.println("NO MATCH");
        }
    }
}

for (Entry<String, Record> rec : records.entrySet()) {
    System.out.println(rec.getValue().getActiveComment() + " and " + rec.getValue().getInactiveComment());
}

And the class Record:

public class Record {

    private String id;

    private String activeComment;

    private String inactiveComment;

    //add setters getters

    //hashcode equals and toString.

}

hashcode and equals compare id only.

ddarellis
  • 3,912
  • 3
  • 25
  • 53
  • Appreciated your response. First column is dynamic it can be any string other than active or Inactive. We have to make decision on 2nd column instead of first column value. – Sks Sep 27 '17 at 13:34
  • Updating question to remove any confusion. – Sks Sep 27 '17 at 13:36
  • No confusion! You can edit the posted code as you wish. – ddarellis Sep 27 '17 at 13:41
2

I hope I got your requirements right. Just use a map to store the "key" values and when you find a pre-existing value concatenate the string:

public static void main(String... args) {
    CsvParserSettings settings = new CsvParserSettings();
    settings.getFormat().setDelimiter(';');

    //looks like you are not interested in the first column.
    //select the columns you actually need - faster and ensures all rows will come out with 2 columns
    settings.selectIndexes(1, 2);

    CsvParser parser = new CsvParser(settings);

    //linked hashmap to keep the original order if that's important
    Map<String, String[]> rows = new LinkedHashMap<String, String[]>();
    for (String[] row : parser.iterate(new File("E:/test.csv"))) {

        String key = row[0];
        String[] existing = rows.get(key);
        if (existing == null) {
            rows.put(key, row);
        } else {
            existing[1] += " and " + row[1];
        }
    }

    //print the result
    for(String[] row : rows.values()){
        System.out.println(row[0] + " - " + row[1]);
    }
}

This prints out:

3189 - Active on this date 2015-03-15-17.03.06.000000 and This is for date 2015-04-21-11.04.11.000000
3190 - It happens on this date 2016-04-22-09.04.27.000000 and Inactive on this date 2016-04-23-09.04.46.000000

Hope it helps

Jeronimo Backes
  • 6,141
  • 2
  • 25
  • 29
1

I tried some dfirty approach that solves your problem somehow. But I ain't sure that whether its a good design or not. You can try adding the following code to your approach:

for (int i = 0; i < allRows.size(); i++) {
                if (allRows.get(i).length < 2)
                    continue;
                for (int j = i + 1; j < allRows.size(); j++) {
                    if (allRows.get(j).length < 2)
                        continue;
                    if (allRows.get(i)[1].equals(allRows.get(j)[1])) // Comparing the second column with other objects
                    {
                        System.out.println("for " + allRows.get(i)[1] + " records- String X=" + allRows.get(i)[2] + " and " + allRows.get(j)[2]);
                        // Say if you have more than two occurences to 3189 then it prints two times this line.
                    }
                }
            }

Output:

for 3189 records- String X=Active on this date 2015-03-15-17.03.06.000000 and This is for date 2015-04-21-11.04.11.000000
for 3190 records- String X=It happens on this date 2016-04-22-09.04.27.000000 and Inactive on this date 2016-04-23-09.04.46.000000
Procrastinator
  • 2,526
  • 30
  • 27
  • 36