3

I am validating a csv file with content like:

TEST;F;12345;0X4321 - 1234 DUMMYTEXT;0X4321 - 1234 TESTTEXT

Until now, the values were seperated by ';' and the method worked like a charm:

private static final String COLUMN_SEPARATOR = ";";

public void validateFile(BufferedReader reader) {

    String line = reader.readLine();

    while (line != null && result == ValidationResult.VALID) {  

        //this is broken with tab-stop as COLUMN_SEPARATOR          
        int matches = StringUtils.countMatches(line, COLUMN_SEPARATOR);

        if (matches != getCSVColumnCount() - 1
            && StringUtils.isNotBlank(line)) {

            if (matches == 0) {
                //MISSING_CSV_COLUMN_SEPERATOR;
            } else {
                //UNEXPECTED_CSV_COLUMN_COUNT;
            }                   
        }
        line = reader.readLine();
    }       
}

As a changed requirement, now I have to handle tab stops as column seperator, while the text can contain whitespaces:

TEST F 12345 0x4321 - 1234 DUMMYTEXT 0x4321 - 1234 TESTTEXT

I changed the following line:

private static final String COLUMN_SEPARATOR = "\\t";

Problem: StringUtils.countMatches(line, "\\t") cannot find any occurences (returns 0). I don't want to do:

int matches = line.split("\\t").length;

as I am supersticious that it would be a significant performance hit (the csv-files aren't small). Do you know a better way to go?

SME_Dev
  • 1,880
  • 13
  • 23

1 Answers1

6

You've escaped the backslash in Java string literal. So the resulting string consists of two characters: a backslash and a 't'.

To represent tabulation character in Java string literal, use \t (note a single backslash).

The fix is:

private static final String COLUMN_SEPARATOR = "\t";

Then StringUtils.countMatches() will work as you expect.

Alex Shesterov
  • 26,085
  • 12
  • 82
  • 103