2

I am using apache commons csvto read contents from a CSV file I get from google trends downloaded as a csv in related query section bottom right. A small subset of the file:

Category: All categories
"bluetooth speakers: (1/1/04 - 8/15/16, Worldwide)"

TOP
speaker,100
bluetooth speaker,100

RISING
portable speakers bluetooth,Breakout
portable speakers,Breakout

My code to read from the file:

private void readCsv(String inputFilePath) {
    try {
        Reader in = new FileReader(inputFilePath);
        Iterable<CSVRecord> records = CSVFormat.RFC4180.withFirstRecordAsHeader().parse(in);
        for (CSVRecord record : records) {
            String topic = record.get(0);
            if (topic != null && !topic.isEmpty()) {
                System.out.println(topic);
            }
        }
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }

}

The output:

bluetooth speakers: (1/1/04 - 8/15/16, Worldwide)
TOP
speaker
bluetooth speaker
RISING
portable speakers bluetooth
portable speakers

Desired Output:

speaker
bluetooth speaker
portable speakers bluetooth
portable speakers

Based on the data from google(without headers) and the two Headers TOP and RISING I am unable to extract the desired values. Is there any configuration for filtering I can apply to get the desired values?

Illegal Argument
  • 10,090
  • 2
  • 44
  • 61

1 Answers1

0

Though strictly not a good solution but for my case ignoring the records that have a single element eliminated the headers. I am still looking for/ working on a solution like a configuration or extending some classes for a cleaner solution.

private void readCsv(String inputFilePath) {
    try {
        Reader in = new FileReader(inputFilePath);
//            Iterable<CSVRecord> records = CSVFormat.RFC4180.withFirstRecordAsHeader().parse(in);
        Iterable<CSVRecord> records = CSVFormat.RFC4180.parse(in);
        for (CSVRecord record : records) {
            if (record.size() <= 1){
                continue;
            }
            String topic = record.get(0);
            if (topic != null && !topic.isEmpty()) {
                System.out.println(topic);
            }
        }
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

The reason this is not a good solution is because there could be many other csv files where this solution could prove buggy. Still could be useful for someone.

Illegal Argument
  • 10,090
  • 2
  • 44
  • 61
  • 2
    To me, it looks more like the file is split into sections, separated by a blank line. Anything before the first blank line is a file header. First line after a blank line is a section header. The remaining lines up to the next blank line is the section content, which is what you're after. – Andreas Aug 15 '16 at 07:11
  • @Andreas any library that can filter the csv file. I could use string manipulation but dont think it is a good solution. I am a noob in working with csv could not find a solution using apache lib – Illegal Argument Aug 15 '16 at 07:41
  • Doubt that any library have that. It's for you for interpret the semantics of the file, after the CSV Parser has parsed the syntactical text. – Andreas Aug 15 '16 at 21:39