-1

I have seen several post regarding this.But they tell about split not StringTokenizer.

This is my input file:inputfile with tab delimiter

I wrote stringTokenizer to get each value in a line.So I am able to get

1.0      3.0
delim1.0
delim3.0
so on

But when I tried to take delimiter as an argument it is not working fine for me.

while ((sCurrentLine = br.readLine()) != null) {
System.out.println(sCurrentLine);
StringTokenizer st = new StringTokenizer(sCurrentLine, args[0]);
 while(st.hasMoreTokens()){
  System.out.println("delim"+st.nextToken());
 }
}

1.passed " " (by pressing "Tab" key in keyboard)as an argument ,It is working fine.

2.passed "\t" as an argument.It is showing

1.0      3.0
delim1.0      3.0

3.passed \t as an argument.It is showing the same

1.0      3.0
delim1.0      3.0

Why is it so.

USB
  • 6,019
  • 15
  • 62
  • 93
  • Downvoters please comment – USB Apr 04 '14 at 04:11
  • 1
    What are you expecting? For the tokenizer to split the line up even though you pass it a tab instead of a space? If the numbers in the file are split with a space then you have to use a space as a delimiter. – Drew Galbraith Apr 04 '14 at 04:16
  • +1 No the spaces are TAB .While posting the question and while formating it became space. – USB Apr 04 '14 at 04:23
  • Does the editor you used to create the text file replace tabs with spaces? It is a common feature and would cause your problem. – Drew Galbraith Apr 04 '14 at 04:25
  • No .while posting in stackoverflow it became space instead of TAB(while doing cntl+k).My original file is TAB seperated – USB Apr 04 '14 at 04:28
  • 1
    Could you copy/paste the text file to pastebin.com and the post the link here? – Drew Galbraith Apr 04 '14 at 04:32
  • @Drew Galbraith: added the pastebin link – USB Apr 04 '14 at 04:41
  • 1
    Hmm tested this myself and it seems like java takes everything in from the command line literally. So there is no way for you to delimit the tab character without putting a physical one in. Other wise you will have to recognize/convert the tab character in your program. – Drew Galbraith Apr 04 '14 at 04:50
  • for split it is working fine .split some regex is running but for strinTokenizer nothing is doing internally – USB Apr 04 '14 at 04:57
  • So is there a specific reason you cannot use split here? In the java API it explicitly states that it treats the token string as regex. – Drew Galbraith Apr 04 '14 at 04:59
  • 1
    This is a snippet of hadoop code.When using split there is performance issue – USB Apr 04 '14 at 05:02
  • You could instantiate the StringTokenizer with only one argument and it would match all whitespace by default. Not sure if your project would allow that because I can't see the rest of the code. – Drew Galbraith Apr 04 '14 at 05:04

1 Answers1

0

"\t" cannot be passed as a string. Try passing it as \\\t and it should work since \t is an escape sequence and should be stored in char in primitive or Character as a wrapper class.

cf-
  • 8,598
  • 9
  • 36
  • 58