Java OpenCSV - 2 List comparison and duplication

Question

i am going to make a application, comparising 2 .csv lists, using OpenCSV. It should works like that:

Open 2 .csv files ( each file has columns: Name,Emails)
Save results ( and here is a prbolem, i don't know if it should be save to table or something)
Compare From List1 and List2 value of "Emails column".
If Email from List 1 appear on List2 - delete it(from list 1)
Export results to new .csv file

I don't know if it's good algorithm. Please Tell me which option to saving results of reading .csv file is best in that case.

Kind Regards

Jeronimo Backes · Accepted Answer · 2015-12-09T05:16:33.593

You can get around this more easily with univocity-parsers as it can read your data into columns:

CsvParserSettings parserSettings = new CsvParserSettings(); //parser config with many options, check the tutorial
parserSettings.setHeaderExtractionEnabled(true); // uses the first row as headers

// To get the values of all columns, use a column processor
ColumnProcessor rowProcessor = new ColumnProcessor();
parserSettings.setRowProcessor(rowProcessor);

CsvParser parser = new CsvParser(parserSettings);

//This will parse everything and pass the data to the column processor
parser.parse(new FileReader(new File("/path/to/your/file.csv")));

//Finally, we can get the column values:
Map<String, List<String>> columnValues = rowProcessor.getColumnValuesAsMapOfNames();

Let's say you parsed the second CSV with that. Just grab the emails and create a set:

Set<String> emails = new HashSet<>(columnValues.get("Email"));

Now just iterate over the first CSV and check if the emails are in the emails set.

Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

score 0 · Answer 2 · answered Dec 21 '15 at 01:56

If you have a hard requirement to use openCSV then here is what I believe is the easiest solution:

First off I like Jeronimo's suggestion about the HashSet. Read the second csv file first using the CSVToBean and save off the email addresses in the HashSet.

Then create a Filter class that implements the CSVToBeanFilter interface. In the constructor pass in the set and in the allowLine method you look up the email address and return true if it is not in the set (so you have a quick lookup).

Then you pass the filter in the CsvToBean.parse when reading/parsing the first file and all you will get are the records from the first file whose email addresses are not on the second file. The CSVToBeanFilter javadoc has a good example that shows how this works.

Lastly use the BeanToCSV to create a file from the filtered list.

In interest of fairness I am the maintainer of the openCSV project and it is also open source and free (Apache V2.0 license).

Java OpenCSV - 2 List comparison and duplication

2 Answers2