1

I am writing a program to parse key value based log like this:

dstcountry="United States" date=2018-12-13 time=23:47:32

I am using Univocity parser to do that. Here is my code.

CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.getFormat().setDelimiter(' ');
parserSettings.getFormat().setQuote('"');
parserSettings.getFormat().setQuoteEscape('"');
parserSettings.getFormat().setCharToEscapeQuoteEscaping('"');
CsvParser keyValueParser = new CsvParser(parserSettings);
String line = "dstcountry=\"United States\" date=2018-12-13 time=23:47:32";
String[] resp = keyValueParser.parseLine(line);

But the parser gives me this output:

dstcountry="United, 
States", 
date=2018-12-13, 
time=23:47:32

where the expected output was

dstcountry="United States", 
date=2018-12-13, 
time=23:47:32

Is there any problem with the code or is this a parser bug?

Regards,
Hari

Harikrishnan
  • 3,664
  • 7
  • 48
  • 77

2 Answers2

1

Author of the lib here. This is not a parser bug. The problem you have here is that you are NOT parsing a CSV file.

When the parser sees: dstcountry="United, followed by a space (which is your delimiter), it will consider that as a value.

The quote setting only applies to fields that start with a quote character. As your input is not "dstcountry=""United States""", the parser won't be able to process this as you want. There is no CSV parser that can do that for you.

Again, you are not processing a CSV. The only thing you could do here is to use 2 parser instances: one to break down the row around the = and another one to break down values separated by in the result of the first parser. For example:

    CsvParserSettings parserSettings = new CsvParserSettings();
    //break down the rows around the `=` character
    parserSettings.getFormat().setDelimiter('=');

    CsvParser keyValueParser = new CsvParser(parserSettings);
    String line = "dstcountry=\"United States\" date=2018-12-13 time=23:47:32";
    String[] keyPairs = keyValueParser.parseLine(line);

    //break down each value around the whitespace.
    parserSettings.getFormat().setDelimiter(' ');
    CsvParser valueParser = new CsvParser(parserSettings);

    //add all values to a list
    List<String> row = new ArrayList<String>();

    for(String value : keyPairs){
        //if a value has a whitespace, break it down using the the other parser instance
        String[] values = valueParser.parseLine(value);

        Collections.addAll(row, values);
    }

    //here is your result
    System.out.println(row);

This will print out:

[dstcountry, United States, date, 2018-12-13, time, 23:47:32]

You now have the key values. The following code will print this out as you want:

    for (int i = 0; i < row.size(); i += 2) {
        System.out.println(row.get(i) + " = " + row.get(i + 1));
    }

Output:

dstcountry = United States

date = 2018-12-13

time = 23:47:32

Hope this helps and thank you for using our parsers!

Jeronimo Backes
  • 6,141
  • 2
  • 25
  • 29
0

I ended up writing my own parser. I am pasting here for future references if anybody needs. suggestions and comments are welcome.

private static final int INSIDE_QT = 1;
private static final int OUTSIDE_QT = 0;

public String[] parseLine(char delimiter, char quote, char quoteEscape, char charToEscapeQuoteEscaping, String logLine) {
           char[] line = logLine.toCharArray();
    List<String> strList = new ArrayList<>();
    int state = OUTSIDE_QT;
    char lastChar = '\0';
    StringBuffer currentToken = new StringBuffer();
    for (int i = 0; i < line.length; i++) {
        if (state == OUTSIDE_QT) {
            if (line[i] == delimiter) {
                strList.add(currentToken.toString());
                currentToken.setLength(0);
            } else if (line[i] == quote) {
                if (lastChar == quoteEscape) {
                    currentToken.deleteCharAt(currentToken.length() - 1);
                    currentToken.append(line[i]);
                } else {
                    if (removeQuotes == false) {
                        currentToken.append(line[i]);
                    }
                    state = INSIDE_QT;
                }
            } else if (line[i] == quoteEscape) {
                if (lastChar == charToEscapeQuoteEscaping) {
                    currentToken.deleteCharAt(currentToken.length() - 1);
                    currentToken.append(line[i]);
                    continue;
                } else {
                    currentToken.append(line[i]);
                }
            } else {
                currentToken.append(line[i]);
            }
        } else if (state == INSIDE_QT) {
            if (line[i] == quote) {
                if (lastChar != quoteEscape) {
                    if (removeQuotes == false) {
                        currentToken.append(line[i]);
                    }
                    if (currentToken.length() == 0) {
                        currentToken.append('\0');
                    }
                    state = OUTSIDE_QT;
                } else {
                    currentToken.append(line[i]);
                }
            } else if (line[i] == quoteEscape) {
                if (lastChar == charToEscapeQuoteEscaping) {
                    currentToken.deleteCharAt(currentToken.length() - 1);
                    currentToken.append(line[i]);
                    continue;
                } else {
                    currentToken.append(line[i]);
                }
            } else {
                currentToken.append(line[i]);
            }
        }
        lastChar = line[i];
    }
    if (lastChar == delimiter) {
        strList.add("");
    }
    if (currentToken.length() > 0) {
        strList.add(currentToken.toString());
    }
    return strList.toArray(new String[strList.size()]);
}
Harikrishnan
  • 3,664
  • 7
  • 48
  • 77