I have 2 CSV files (district1.csv, district2.csv) in a directory, each containing a column schoolCode
.
When I read both CSV files with the Apache commons CSV library, I am reading the distinct values of the schoolCode
column and counting up the results.
Here is my code:
public void getDistinctRecordCount() throws IOException {
Set<String> uniqueSchools = new HashSet<>();
int numOfSchools;
String SchoolCode;
//Filter to only read csv files.
File[] files = Directory.listFiles(new FileExtensionFilter());
for (File f : files) {
CSVParser csvParser;
CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader().withIgnoreHeaderCase().withTrim();
reader = Files.newBufferedReader(Paths.get(Directory + "\\" + f.getName() ), StandardCharsets.ISO_8859_1);
csvParser = CSVParser.parse(reader, csvFormat);
for (CSVRecord column : csvParser) {
SchoolCode = column.get("School Code");
uniqueSchools.add(SchoolCode);
}
Logger.info("The list of Schools for " + f.getName() + " are: " + uniqueSchools);
numOfSchools = uniqueSchools.size();
Logger.info("The total count of Schools for " + f.getName() + " are: " + numOfSchools);
Logger.info("-----------------------");
}
}
Here is my output:
[INFO ] [Logger] - The list of Schools for district1.csv are: [01-0003-002, 01-0003-001]
[INFO ] [Logger] - The total count of Schools for district1.csv are: 2
[INFO ] [Logger] - The list of Schools for district2.csv are: [01-0003-002, 01-0003-001, 01-0018-004, 01-0018-005, 01-0018-002, 01-0018-003, 01-0018-008, 01-0018-006]
[INFO ] [Logger] - The total count of Schools for district2.csv are: 8
Problem: The two values read in from the district1.csv result are appended to the district2.csv result, throwing off my count by 2 for district2.csv (actual correct value should be 6). How is it being appended?