-3

I am trying to parse a CSV file line by line

String rowStr = br.readLine(); 

When i tried printing the rowStr i see the below

"D","123123","JAMMY,"," ","PILOT"

How can i remove the comma from a value field? I want to retain the commas outside.

Emma
  • 27,428
  • 11
  • 44
  • 69
ACP
  • 34,682
  • 100
  • 231
  • 371

2 Answers2

2

This expression might help you to do so, however it may be unnecessary to do this task with regular expressions. If yet you wish/have to do so:

(")([A-z0-9\s]+)([,]?)(",)?

I have added boundaries, just to be safe. You can much simplify it. The key is to add a capturing group before and one after the values.

enter image description here

For instance, one boundary is that, in case you might accidentally have extra commas that are not values, it won't capture that

enter image description here

Graph

This graph shows how the expression would work and you can visualize other expressions in this link:

enter image description here

Java Test

import java.util.regex.Matcher;
import java.util.regex.Pattern;

final String regex = "(\")([A-z0-9\\s]+)([,]?)(\",)?";
final String string = "\"D\",\"123123\",\"JAMMY,\",\" \",\"PILOT\"";
final String subst = "\\1\\2 \\4";

final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);

// The substituted value will be contained in the result variable
final String result = matcher.replaceAll(subst);

System.out.println("Substitution result: " + result);

JavaScript Test Demo

const regex = /(")([A-z0-9\s]+)([,]?)(",)?/gm;
const str = `"D","123123","JAMMY,"," ","PILOT"`;
const subst = `$1$2 $4`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

Performance Test

repeat = 1000000;
start = Date.now();

for (var i = repeat; i >= 0; i--) {
 var string = '"D","123123","JAMMY,"," ","PILOT"';
 var regex = /(")([A-z0-9\s]+)([,]?)(",)?/gm;
 var match = string.replace(regex, "$1$2$4");
}

end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match  ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test.  ");
Emma
  • 27,428
  • 11
  • 44
  • 69
  • 1
    It wouldn't match where a field *begins* with a comma, as in `"FOO","BAR",",<-HERE"`. Though you provided Java code so it's a way more comprehensive answer than mine – P Varga May 08 '19 at 20:05
1

Use a regex like this:

(?<!"),|,(?!")

Matches a comma not preceded or not followed by a ".
Test here.

P Varga
  • 19,174
  • 12
  • 70
  • 108