1

<tag k="addr:street" v="St. Croix gate"/>

public void map(Object key, Text value, Context context
    ) throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString());
        while (itr.hasMoreTokens()) {
                    String cb = itr.nextToken();
              if(cb.contains("k=\"addr:street\"")){
                       String roadName = itr.nextToken();

                 while(!roadName.contains("\"/>")) {
                      roadName = roadName + itr.nextToken();

                  }
                  word.set(roadName);
                  context.write(word, one);
            }

        }
    }
}

So as you can see I`m trying to get string inside v="St. Croix Gate"/> but since the Tokenizer adds a new token for every whitespace Im only getting the output "gate"

Malt
  • 28,965
  • 9
  • 65
  • 105
zyzdar
  • 11
  • 2
  • 7
    You should be processing that kind of data with a suitable tag parser instead of `StringTokenizer`. Especially if all/most of your data is tags. If your data is XML, you should be using an XML parser. – Kayaman Oct 03 '19 at 11:43

2 Answers2

0

this worked for me:

    String element = "<tag k=\"addr:street\" v=\"St. Croix gate\"/>";
    String searchAtt = "v";
    StringTokenizer itr = new StringTokenizer(element);
    while (itr.hasMoreTokens()) {
        // split by '='
        String s = itr.nextToken("=");
        // is splited by '=' so the last word is the attribute name
        if (s.endsWith(searchAtt)) {
            // next token is '=' then comes the value of the attribute
            // split it by \"
            itr.nextToken("\"");
            // next token will be the content
            String content = itr.nextToken();
            System.out.println("Searched attribute: " + content);
        }
    }
elbraulio
  • 994
  • 6
  • 15
0

Allow me to start by saying that parsing xml without an xml parser is a very bad idea for a multitude of reasons.

However, if you want to extract the conents of v using just string manipulation, here's one way of doing it:

String s = "<tag k=\"addr:street\" v=\"St. Croix gate\"/>";
int vIndex = s.indexOf("v=\"");
int vendQuotesIndex = s.indexOf("\"", vIndex + 3);
System.out.println(s.substring(vIndex + 3, vendQuotesIndex)); // Prints "St. Croix gate"
Malt
  • 28,965
  • 9
  • 65
  • 105