3

How do I configure Super CSV to skip blank or white-space only lines?

I'm using the CsvListReader and sometimes I'll get a blank line in my data. When this happens, an exception to the effect of:

number of CellProcessors must match number of fields

I'd like to simply skip these lines.

James Bassett
  • 9,458
  • 4
  • 35
  • 68
dkantowitz
  • 1,931
  • 1
  • 16
  • 22

2 Answers2

3

Update: Super CSV 2.1.0 (released April 2013) allows you to supply a CommentMatcher via the preferences that will let you skip lines that are considered comments. There are 2 built in matchers you can use, or you can supply your own. In this case you could use new CommentMatches("\\s+") to skip blank lines.


Super CSV only skips lines of zero length (just a line terminator).

It's not a valid CSV file if there are blank lines (see rule 4 of RFC4180 which states that Each line should contain the same number of fields throughout the file). The only time a blank line is valid is if it's part of a multi-line field surrounded by quotes. e.g.

column1,column2
"multi-line field

with a blank line",value2

That being said, it might be possible to make Super CSV a bit more lenient with blank lines (it could ignore them). If you could post a feature request on our SourceForge page, we can investigate this further and potentially add this functionality in a future release.

That doesn't help you right now though!

I haven't done extensive testing on this, but it should work :) You can write your own tokenizer that skips blank lines:

package org.supercsv.io;

import java.io.IOException;
import java.io.Reader;
import java.util.List;

import org.supercsv.prefs.CsvPreference;

public class SkipBlankLinesTokenizer extends Tokenizer {

    public SkipBlankLinesTokenizer(Reader reader, CsvPreference preferences) {
        super(reader, preferences);
    }

    @Override
    public boolean readColumns(List<String> columns) throws IOException {

        boolean moreInput = super.readColumns(columns);

        // keep reading lines if they're blank
        while (moreInput && (columns.size() == 0 || 
                             columns.size() == 1 && 
                             columns.get(0).trim().isEmpty())){
            moreInput = super.readColumns(columns);
        }

        return moreInput;
    }

}

And just pass this into the constructor of your reader (you'll have to pass the preferences into both the reader and the tokenizer):

ICsvListReader listReader = null;
try {
    CsvPreference prefs = CsvPreference.STANDARD_PREFERENCE;
    listReader = new CsvListReader(
        new SkipBlankLinesTokenizer(new FileReader(CSV_FILENAME), prefs),
        prefs);
...

Hope this helps

James Bassett
  • 9,458
  • 4
  • 35
  • 68
  • Great suggestion! I was messing around at the line reading level, but that wasn't really correct. I did have to modify your code a little. A single column of whitespace is parsed by readColumns() so columns.size() == 1. I updated your answer with the code I used. – dkantowitz Dec 10 '12 at 20:54
  • hmm...apparently my edit to your answer is waiting "peer review". Here's the new loop condition: (moreInput && columns.size() == 1 && columns.get(0).trim().isEmpty()) – dkantowitz Dec 10 '12 at 20:57
  • Ok, I updated your edit to cater for empty lines (`columns.size() == 0`) like my original answer (it's more useful to other people). I was considering using `getUntokenizedRow().trim().isEmpty()` instead, but that would break when reading tab-delimited files. – James Bassett Dec 10 '12 at 22:53
0

I didn't know this library (you should add a Java tag...), but looking at the examples, I see they have readers supporting a variable number of rows per line. An empty line is a sub-case of this pattern.

Alternatively (maybe less efficient), you can just catch the exception and go on with your reading...

PhiLho
  • 40,535
  • 6
  • 96
  • 134