StringUtils.countMatches() isn't working for tab char

Question

I am validating a csv file with content like:

TEST;F;12345;0X4321 - 1234 DUMMYTEXT;0X4321 - 1234 TESTTEXT

Until now, the values were seperated by ';' and the method worked like a charm:

private static final String COLUMN_SEPARATOR = ";";

public void validateFile(BufferedReader reader) {

    String line = reader.readLine();

    while (line != null && result == ValidationResult.VALID) {  

        //this is broken with tab-stop as COLUMN_SEPARATOR          
        int matches = StringUtils.countMatches(line, COLUMN_SEPARATOR);

        if (matches != getCSVColumnCount() - 1
            && StringUtils.isNotBlank(line)) {

            if (matches == 0) {
                //MISSING_CSV_COLUMN_SEPERATOR;
            } else {
                //UNEXPECTED_CSV_COLUMN_COUNT;
            }                   
        }
        line = reader.readLine();
    }       
}

As a changed requirement, now I have to handle tab stops as column seperator, while the text can contain whitespaces:

TEST F 12345 0x4321 - 1234 DUMMYTEXT 0x4321 - 1234 TESTTEXT

I changed the following line:

private static final String COLUMN_SEPARATOR = "\\t";

Problem: StringUtils.countMatches(line, "\\t") cannot find any occurences (returns 0). I don't want to do:

int matches = line.split("\\t").length;

as I am supersticious that it would be a significant performance hit (the csv-files aren't small). Do you know a better way to go?

Why do you use two backslashes? Tab is just the `\t`. – Thomas Jungblut Nov 19 '14 at 12:57 — Thomas Jungblut, Nov 19 '14 at 12:57
Ouch. Thank you. It works now. – SME_Dev Nov 19 '14 at 13:00 — SME_Dev, Nov 19 '14 at 13:00

score 6 · Accepted Answer · answered Nov 19 '14 at 12:58

You've escaped the backslash in Java string literal. So the resulting string consists of two characters: a backslash and a 't'.

To represent tabulation character in Java string literal, use \t (note a single backslash).

The fix is:

private static final String COLUMN_SEPARATOR = "\t";

Then StringUtils.countMatches() will work as you expect.

StringUtils.countMatches() isn't working for tab char

1 Answers1