3

I'm terrible with regex stuff. I have data that looks like this:

abc,42,4/04/1992,,,something,   ,2/05/2007,dkwit,,334,,,

The meaning of the data itself is somewhat irrelevant, the point is that it's comma-delimited, you could refer to the data between commas as "columns", and some columns may be whitespace or empty (later on, whitespace columns and empty columns are ignored). I need to split the string into an array based on the comma delimiter. I tried

new StringTokenizer(string, ",")

but that will skip over tokens where the data between columns is empty, so I tried using string.split(","). The problem with that is it would skip the last three columns in the data above. You could say after the "334", it behaves like StringTokenizer, skipping the columns with no whitespace or no data in them.

Can I make string.split( ) behave in such a way that it will continue to split until it comes across an end of line, or is there a better way to do this?

Pshemo
  • 122,468
  • 25
  • 185
  • 269
rshaq
  • 149
  • 4
  • 13
  • And what do you want to do when the end of line is reached? What is there is a comma in the values? How is it escaped? – fge Jan 29 '15 at 18:07
  • @fge essentially, I want to split on commas OR end of lines. There will never be a comma inside the columns. In other words, the data will never contain a comma as part of the actual values. We can assume this. – rshaq Jan 29 '15 at 18:09

2 Answers2

4

You can use the overloaded String#split(String,int) method, and set the limit to a negative number:

String text = "abc,42,4/04/1992,,,something, ,2/05/2007,dkwit,,334,,,";
String[] tokens = text.split(",", -1);

The limit parameter is explained in the linked Javadoc:

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

M A
  • 71,713
  • 13
  • 134
  • 174
1

Easiest way to parse CSV (comma-separated values) data is with CVS parser. One of simplest ones is OpenCVS. Here is example of how you can do it:

String data = "abc,42,4/04/1992,,,something,   ,2/05/2007,dkwit,,334,,,";

CSVReader reader = new CSVReader(new StringReader(data));
for (String[] tokens = reader.readNext(); tokens != null; tokens = reader.readNext()) {
    for (String token : tokens){
        System.out.print("<" + token + ">\t");
    }
    System.out.println();
}

Output (I added < and > to show where value starts and ends):

<abc>   <42>    <4/04/1992> <>  <>  <something> <   >   <2/05/2007> <dkwit> <>  <334>   <>  <>  <>  
Pshemo
  • 122,468
  • 25
  • 185
  • 269