Java reading csv file with multiple headers

Question

I am using apache commons csvto read contents from a CSV file I get from google trends downloaded as a csv in related query section bottom right. A small subset of the file:

Category: All categories
"bluetooth speakers: (1/1/04 - 8/15/16, Worldwide)"

TOP
speaker,100
bluetooth speaker,100

RISING
portable speakers bluetooth,Breakout
portable speakers,Breakout

My code to read from the file:

private void readCsv(String inputFilePath) {
    try {
        Reader in = new FileReader(inputFilePath);
        Iterable<CSVRecord> records = CSVFormat.RFC4180.withFirstRecordAsHeader().parse(in);
        for (CSVRecord record : records) {
            String topic = record.get(0);
            if (topic != null && !topic.isEmpty()) {
                System.out.println(topic);
            }
        }
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }

}

The output:

bluetooth speakers: (1/1/04 - 8/15/16, Worldwide)
TOP
speaker
bluetooth speaker
RISING
portable speakers bluetooth
portable speakers

Desired Output:

speaker
bluetooth speaker
portable speakers bluetooth
portable speakers

Based on the data from google(without headers) and the two Headers TOP and RISING I am unable to extract the desired values. Is there any configuration for filtering I can apply to get the desired values?

What you have there is ***multiple*** different csv "files" in one physical file. You have to separate them before parsing them as CSV. — Jim Garrison, Aug 15 '16 at 06:43

score 0 · Answer 1 · answered Aug 15 '16 at 06:40

Though strictly not a good solution but for my case ignoring the records that have a single element eliminated the headers. I am still looking for/ working on a solution like a configuration or extending some classes for a cleaner solution.

private void readCsv(String inputFilePath) {
    try {
        Reader in = new FileReader(inputFilePath);
//            Iterable<CSVRecord> records = CSVFormat.RFC4180.withFirstRecordAsHeader().parse(in);
        Iterable<CSVRecord> records = CSVFormat.RFC4180.parse(in);
        for (CSVRecord record : records) {
            if (record.size() <= 1){
                continue;
            }
            String topic = record.get(0);
            if (topic != null && !topic.isEmpty()) {
                System.out.println(topic);
            }
        }
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

The reason this is not a good solution is because there could be many other csv files where this solution could prove buggy. Still could be useful for someone.

To me, it looks more like the file is split into sections, separated by a blank line. Anything before the first blank line is a file header. First line after a blank line is a section header. The remaining lines up to the next blank line is the section content, which is what you're after. — Andreas, Aug 15 '16 at 07:11
@Andreas any library that can filter the csv file. I could use string manipulation but dont think it is a good solution. I am a noob in working with csv could not find a solution using apache lib — Illegal Argument, Aug 15 '16 at 07:41
Doubt that any library have that. It's for you for interpret the semantics of the file, after the CSV Parser has parsed the syntactical text. — Andreas, Aug 15 '16 at 21:39

Java reading csv file with multiple headers

1 Answers1