2

Currently i have tried getting words that start with an upper case in a sentence using Character.isUpperCase. However now i would like to only retrieve phrases in a sentence where all the 1st letter in every word of the phrase is upper case. How should i go about doing it.

E.g "This is a sample sentence so that Ang Mo Kio Avenue 1 is part of Ang Mo Kio."

I would retrieve "Ang Mo Kio Avenue 1" and "Ang Mo Kio".

String s = "This is a sample sentence so that Ang Mo Kio Avenue 1 is part of Ang Mo Kio.";
String[] words = s.split("[^a-zA-Z']+");
for (int i = 0; i < words.length; i++) {
  if (Character.isUpperCase(words[i].charAt(0))) {
  System.out.println(words[i]);
}}

The real intention is to extract 3 or more uppercase words, optionally followed by a number

Dairo
  • 822
  • 1
  • 9
  • 22
user2541163
  • 717
  • 2
  • 7
  • 22
  • I don't see how *1* is uppercase, and why *This* is not included in expected output even if it's uppercase. Also by splitting by `\\s` you include punctuation marks into words. – m0skit0 Nov 18 '13 at 11:28
  • @m0skit0 Looks like it's part of the phrase.. – Maroun Nov 18 '13 at 11:28

2 Answers2

2

i would like to only retrieve phrases in a sentence where all the 1st letter in every word of the phrase is upper case

For that you need to capture consecutive upper case words, and append them in StringBuilder. If lowercase letter come then initialize StringBuilder.

Try,

StringBuilder answer = new StringBuilder();
String s
 = "This is a sample sentence so that Ang Mo Kio Avenue 1 is part of Ang Mo Kio.";
    String[] words = s.split("\\s+");
    int count=0;
    for (int i = 0; i < words.length; i++) {
        char firstChar=words[i].charAt(0);
        if (Character.isUpperCase(firstChar) 
                 ||(count>0  && Character.isDigit(firstChar))) {
            answer.append(" "+words[i]);
            count++;
        } else {
            //To avoid less than 3 word apply this logic.
            if(count>2){
            System.out.println(answer);
            }
            count=0;
            answer = new StringBuilder();
        }
    }
    System.out.println(answer);// Also need to print answer here.

Output:

 Ang Mo Kio Avenue 1
 Ang Mo Kio.
Masudul
  • 21,823
  • 5
  • 43
  • 58
1

As basic starting code, you might try the following function:

private static void printStreetNames(String text) {
    List<String> words = new ArrayList<String>();

    for (String w : text.split("\\s+")) {
        if (Character.isUpperCase(w.charAt(0))) {
            words.add(w);
            continue;
        }

        if (w.matches("\\d+") && words.size() > 1) {
            words.add(w);
            continue;
        }

        if (words.size() >= 2) {
            System.out.println(words);
        }
        words = new ArrayList<String>();
    }

    if (words.size() >= 2) {
        System.out.println(words);
    }
}

Output:

[Ang, Mo, Kio, Avenue, 1]
[Ang, Mo, Kio.]

There are some caveats though. For example the following would not parse correctly: Ang Mo Kio 1 1 (because we do not check if we already added a street number). Also it does not remove trailing . from the parsed street names (e.g. Kio.) But I'll leave it up to you as an excercise.

Matt
  • 17,290
  • 7
  • 57
  • 71