<tag k="addr:street" v="St. Croix gate"/>
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
String cb = itr.nextToken();
if(cb.contains("k=\"addr:street\"")){
String roadName = itr.nextToken();
while(!roadName.contains("\"/>")) {
roadName = roadName + itr.nextToken();
}
word.set(roadName);
context.write(word, one);
}
}
}
}
So as you can see I`m trying to get string inside v="St. Croix Gate"/> but since the Tokenizer adds a new token for every whitespace Im only getting the output "gate"