0

HI i am trying to tokenise a text file using StringTokeniser in java. But the problem is it only takes the last word of the line. Little help is needed. This sample code is part of a map-reduce function.

String profile;

StringTokenizer inputKey=new StringTokenizer(value.toString());
while(inputKey.hasMoreTokens()){

    String input=inputKey.nextToken();
    if(!input.endsWith("</id>")){
        textInput.set(input);
    } else {
        profile=input.substring(4,15);
        profileId.set(profile);

    }
}
user207421
  • 305,947
  • 44
  • 307
  • 483
Rohit Haritash
  • 404
  • 5
  • 20
  • 2
    What's the input nad expected output? – Joachim Sauer May 03 '12 at 05:49
  • 1
    What is the delimiter for the String your are trying to tokenize? Without passing in a specific one, StringTokenizer defaults to a `" "` as the delimiter. – Hunter McMillen May 03 '12 at 05:50
  • the input is from 3 different text files. Sampl input is------ Saudi Arabia sa Logistics and Supply Chain 17 years of GCC experience in the field of Construction, Trading and Manufacturing Industries. http://www.linkedin.com/pub/joseph-john/8/866/77 – Rohit Haritash May 03 '12 at 05:52
  • now problem is -- Saudi Arabia is in 1 line . I m only getting arabia. 2nd line is ---sa Logistics and Supply Chain. i am only getting chain. @hunter NO specific delimiters. – Rohit Haritash May 03 '12 at 05:54
  • 1
    what is `textInput`? I think the `set` method may overwrite previously set values? Did you step through your program with a debugger, or at least put some sysouts in the code? – hage May 03 '12 at 05:56
  • textInput is Text . it is declared as private Text textInput=new Text(); // Key – Rohit Haritash May 03 '12 at 05:59
  • Ok guys Thanks for suggestion . I got the bug. Seting is overwriting the value. – Rohit Haritash May 03 '12 at 06:08

2 Answers2

2

You should use a debugger as most have said, and stop using StringTokenizer and starting String.split..

You have instantiated a StringTokenizer object without the delimiter, you can either set the delimiter explicitly (it could be "," or "." In your case) or use a constructor that accepts both the delimiter and the String that you are trying to parse.

Oh Chin Boon
  • 23,028
  • 51
  • 143
  • 215
  • Ok . can u explain me difference with example. Because i have to use tokniser. I have to parse a file with thousands of strings. Thanks – Rohit Haritash May 03 '12 at 07:03
  • +1 For split(). @Rohit Haritash, did you bother looking into StringTokenizer javadoc? "StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead." Besides, if all your input fits into single String object there is no problem in splitting it into several smaller strings. – Dima May 03 '12 at 07:15
  • cant use regex here. In this task some time i have parse through xml and html tag. Pattern is tough to recognise .OK i will try to implement the split to this now. Thanks – Rohit Haritash May 03 '12 at 07:16
0

These kind of problems are perfect to learn how to debug a program.

boskop
  • 609
  • 5
  • 23