Currently, I tend to remove comma in a string for a CSV line.
Here are my expectation
// (1) ",123,456," -> ",123456,"
// (2) ","abc,def"," -> ","abcdef","
// (3) ","123,456"," -> ","123456","
// (4) ","abcdef,"," -> ","abcdef","
I wrote the following code
String[] test = {
"\",123,456,\"",
"\",\"abc,def\",\"",
"\",\"123,456\",\"",
"\",\"abcdef,\",\""
};
final Pattern commaNotBetweenQuotes = Pattern.compile("(?<!\"),(?!\")");
for (String d : test) {
System.out.println("O : " + d);
String result = commaNotBetweenQuotes.matcher(d).replaceAll("");
System.out.println("R : " + result);
}
However, I fail in case (4)
Here is the output I get
O : ",123,456,"
R : ",123456,"
O : ","abc,def","
R : ","abcdef","
O : ","123,456","
R : ","123456","
O : ","abcdef,","
R : ","abcdef,"," <-- we expect the comma after "f" being remove, as
it is inside string quote
May I know how I can further improve this regular expression pattern?
final Pattern commaNotBetweenQuotes = Pattern.compile("(?<!\"),(?!\")");
I get the code from Different regular expression result in Java SE and Android platform
What I understand on the pattern is that
If a comma doesn't have double quote on its left AND on its right, replace it with empty string.
I try to use
final Pattern commaNotBetweenQuotes = Pattern.compile("(?<!\"),(?!\")|(?<![\"0-9]),(?=\")");
with idea
If a comma doesn't have double quote on its left AND on its right, replace it with empty string.
OR
If a comma has double quote on its right, and non-digit / non double quote on its left, replace it with empty string.
However, the "solution" is not elegant. What I really want is, remove the comma within string literal. remove comma within integer. retain comma used as CSV seperator.
Try not to use $1
, as Android will use "null" instead of "" for unmatched group.