0

I am reading a .dat file line by line and I want to separate fields using the delimiter ("\t"), because every field is separated by tab.

However, there are some non-required fields and they can be blank, so if there are two consecutive tabs ("\t"), I want to detect the second one and store a blank String.

StringTokenizer stringTokenizer = new StringTokenizer(line, "\t");
ArrayList<String> al = new ArrayList<>();
while (stringTokenizer.hasMoreTokens()) {
    al.add(stringTokenizer.nextToken());
}
System.out.println(al.size() + " >> " + al);

When I try the above and I have the following input lines:

R   900081458   22222-22-2          1   -1  1   0   0   1
R   245047685   7250-46-6           0   -1  0   0   0   0
R   245048731   13755-29-8      237-340-6   0   -1  0   0   0   0
R   245047201   1080-12-2       214-096-9   0   -1  0   0   0   0
R   1   118725-24-9 612-118-00-5    405-080-4   0   0   0   0   0   0

I can't handle the two consecutive tabs, so I have the following output:

9 >> [R, 900081458, 22222-22-2, 1, -1, 1, 0, 0, 1]
9 >> [R, 245047685, 7250-46-6, 0, -1, 0, 0, 0, 0]
10 >> [R, 245048731, 13755-29-8, 237-340-6, 0, -1, 0, 0, 0, 0]
10 >> [R, 245047201, 1080-12-2, 214-096-9, 0, -1, 0, 0, 0, 0]
11 >> [R, 1, 118725-24-9, 612-118-00-5, 405-080-4, 0, 0, 0, 0, 0, 0]

While the desired output would be something like this (in case I fill the two consecutive blanks with "BLANK"):

11 >> [R, 900081458, 22222-22-2, "BLANK", "BLANK", 1, -1, 1, 0, 0, 1]
11 >> [R, 245047685, 7250-46-6, "BLANK", "BLANK", 0, -1, 0, 0, 0, 0]
11 >> [R, 245048731, 13755-29-8, 237-340-6, "BLANK", 0, -1, 0, 0, 0, 0]
11 >> [R, 245047201, 1080-12-2, 214-096-9, "BLANK", 0, -1, 0, 0, 0, 0]
11 >> [R, 1, 118725-24-9, 612-118-00-5, 405-080-4, 0, 0, 0, 0, 0, 0]
Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
  • 2
    You'd be better off using an off-the-shelf CSV reader, such as OpenCSV (or any of the many available Open Source CSV readers). – k314159 Jan 23 '23 at 10:48
  • 1
    *You'd be better off using an off-the-shelf CSV reader* that's quite true, but in any case, if it's an otherwise simple csv, `String.split` would in any case detect the empty field, so there's no problem – g00se Jan 23 '23 at 11:01

1 Answers1

1

StringTokenizer is not great with blanks, use String.split() instead. Try this:

String[] strings = line.split("\t");
ArrayList<String> al = new ArrayList<>();
for (String string : strings) {
    al.add(string );
}
System.out.println(al.size() + " >> " + al);

As per k314159 - using opencsv is much smarter.

John Williams
  • 4,252
  • 2
  • 9
  • 18
  • 2
    `List al = Arrays.asList(line.split("\t"));` and then `al.replaceAll(s -> s.isBlank() ? "BLANK" : s);` if you don't like Stream API. – chptr-one Jan 23 '23 at 11:25