0

I am using the java Pattern & Matcher to extract the words between two tags.

My code is like:

final Pattern pattern = Pattern.compile("<([A-Za-z][A-Za-z0-9]*)\\b[^>]*>(.*?)</\\1>");
    List<String> topicArray = new ArrayList<String>();
    final Matcher matcher = pattern.matcher("<City count='1' relevance='0.304' normalized='Shanghai,China'>Shanghai</City>");
    while (matcher.find()) {
        topicArray.add(matcher.group(1));
    }

The system only gives me City as output instead of Shanghai. What's wrong with it?

Thanks

Alan Moore
  • 73,866
  • 12
  • 100
  • 156

1 Answers1

0

You can try the next:

private static final Pattern REGEX_PATTERN = 
        Pattern.compile("<[^>]*>([^<>]*)<[^>]*>");

public static void main(String[] args) {
    String input = "<City count='1' relevance='0.304' normalized='Shanghai,China'>Shanghai</City>";

    System.out.println(
        REGEX_PATTERN.matcher(input).replaceAll("$1")
    );  // prints "Shanghai"
}
Paul Vargas
  • 41,222
  • 15
  • 102
  • 148