1

I have 2 CSV files (district1.csv, district2.csv) in a directory, each containing a column schoolCode. When I read both CSV files with the Apache commons CSV library, I am reading the distinct values of the schoolCode column and counting up the results. Here is my code:

public void getDistinctRecordCount() throws IOException {
        Set<String> uniqueSchools = new HashSet<>();
        int numOfSchools;
        String SchoolCode;

    //Filter to only read csv files.
    File[] files = Directory.listFiles(new FileExtensionFilter());

    for (File f : files) {
        CSVParser csvParser;
        CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader().withIgnoreHeaderCase().withTrim();
        reader = Files.newBufferedReader(Paths.get(Directory + "\\" + f.getName() ), StandardCharsets.ISO_8859_1);
        csvParser = CSVParser.parse(reader, csvFormat);
        for (CSVRecord column : csvParser) {
            SchoolCode = column.get("School Code");
            uniqueSchools.add(SchoolCode);
        }
        Logger.info("The list of Schools for " + f.getName() + " are: " + uniqueSchools);
        numOfSchools = uniqueSchools.size();
        Logger.info("The total count of Schools for " + f.getName() + " are: " + numOfSchools);
        Logger.info("-----------------------");
    }
}

Here is my output:

[INFO ] [Logger] - The list of Schools for district1.csv are: [01-0003-002, 01-0003-001]
[INFO ] [Logger] - The total count of Schools for district1.csv are: 2
[INFO ] [Logger] - The list of Schools for district2.csv are: [01-0003-002, 01-0003-001, 01-0018-004, 01-0018-005, 01-0018-002, 01-0018-003, 01-0018-008, 01-0018-006]
[INFO ] [Logger] - The total count of Schools for district2.csv are: 8

Problem: The two values read in from the district1.csv result are appended to the district2.csv result, throwing off my count by 2 for district2.csv (actual correct value should be 6). How is it being appended?

Ori Marko
  • 56,308
  • 23
  • 131
  • 233
ktaylor
  • 13
  • 2
  • because you are not resetting Hashset(), you need to reset it in for loop. add following line inside for loop `uniqueSchools = new HashSet<>();` – dkb Oct 14 '18 at 06:14

1 Answers1

0

If you don't need set of all schools you can just move uniqueSchools inside loop or clear it:

for (File f : files) {
   uniqueSchools.clear();

You can also save in Map<String, String> the schools per file or create a set per file, log the count and then addAll set to uniqueSchools

Set<String> currentSchools = new HashSet<>();
..
currentSchools.add(SchoolCode);
Logger.info("The list of Schools for " + f.getName() + " are: " + currentSchools);
numOfSchools = currentSchools.size();
Logger.info("The total count of Schools for " + f.getName() + " are: " + numOfSchools);        
uniqueSchools.addAll(currentSchools);
  • Consider lowercase (camel case) first letter of variables, e.g. change SchoolCode to schoolCode and Logger to logger
Ori Marko
  • 56,308
  • 23
  • 131
  • 233
  • I had a feeling it was something simple. Your suggestion of clearing uniqueSchools inside the loop did the trick! Thanks so much! – ktaylor Oct 14 '18 at 06:39