0

I have an operation that deals with many space delimited strings, I am looking for a regex for the String matches function which will trigger pass if first two strings before first space starts with capital letters and will return false if they are not.

Examples:

"AL_RIT_121 PA_YT_32 rit cell 22 pulse"

will return true as first two substring AL_RIT_121 and PA_YT_32 starts with capital letter A and P respectively

"AL_RIT_252 pa_YT_21 mal cell reg 32 1 ri"

will return false as p is in lower case.

Martin Ender
  • 43,427
  • 11
  • 90
  • 130
Avishek
  • 390
  • 4
  • 22

5 Answers5

6
Pattern.compile("^\\p{Lu}\\S*\\s+\\p{Lu}")

will work with the .find() method. There's no reason to use matches on a prefix test, but if you have an external constraint, just do

Pattern.compile("^\\p{Lu}\\S*\\s+\\p{Lu}.*", Pattern.DOTALL)

To break this down:

  1. ^ matches the start of the string,
  2. \\p{Lu} matches any upper-case letter,
  3. \\S* matches zero or more non-space characters, including _
  4. \\s+ matches one or more space characters, and
  5. the second \\p{Lu} matches the upper-case letter starting the second word.

In the second variant, .* combined with Pattern.DOTALL matches the rest of the input.

Mike Samuel
  • 118,113
  • 30
  • 216
  • 245
  • You don't really have to put `\p{Lu}` in square brackets; like `\s` and `\S`, it can stand alone. And it's `DOTALL`, not `DOT_ALL`. I almost always have to look that up, in Python as well as in Java, but they both spell it without the underscore. – Alan Moore Nov 19 '12 at 21:48
  • Thanks a lot to you all, the provided regex worked perfectly for me :) – Avishek Nov 20 '12 at 10:14
4

Simply string.matches("[A-Z]\\w+ [A-Z].*")

Orabîg
  • 11,718
  • 6
  • 38
  • 58
1

You can use a specific regex if those two examples demonstrate your input format:

^(?:[A-Z]+_[A-Z]+_\d+\s*)+

Which means:

^           - Match the beginning of the string
(?:         - Start a non-capturing group (used to repeat the following)
    [A-Z]+  - Match one or more uppercase characters
    _       - Match an underscore
    [A-Z]+  - Match one or more uppercase characters
    _       - Match an underscore
    \d+     - Match one or more decimals (0-9)
    \s*     - Match zero or more space characters
)+          - Repeat the above group one or more times

You would use it in Java like this:

Pattern pattern = Pattern.compile("^(?:[A-Z]+_[A-Z]+_\\d+\\s*)+");
Matcher matcher = p.matcher( inputString);
if( matcher.matches()) {
    System.out.println( "Match found.");
}
nickb
  • 59,313
  • 13
  • 108
  • 143
1

Check this out:

    public static void main(String[] args) 
{
    String text = "AL_RIT_121 pA_YT_32 rit cell 22 pulse";

    boolean areFirstTwoWordsCapitalized = areFirstTwoWordsCapitalized(text);

    System.out.println("areFirstTwoWordsCapitalized = <" + areFirstTwoWordsCapitalized + ">");

}

private static boolean areFirstTwoWordsCapitalized(String text)
{
    boolean rslt = false;

    String[] words = text.split("\\s");

    int wordIndx = 0;

    boolean frstWordCap = false;
    boolean scndWordCap = false;

    for(String word : words)
    {
        wordIndx++;

        //System.out.println("word = <" + word + ">");

        Pattern ptrn = Pattern.compile("^[A-Z].+");

        Matcher mtchr = ptrn.matcher(word);

        while(mtchr.find())
        {
            String match = mtchr.group();

            //System.out.println("\tMatch = <" + match + ">");

            if(wordIndx == 1)
            {
                frstWordCap = true;
            }
            else if(wordIndx == 2)
            {
                scndWordCap = true;
            }
        }
    }

    rslt = frstWordCap && scndWordCap;

    return rslt;
}
amphibient
  • 29,770
  • 54
  • 146
  • 240
1

Try this:

public class RegularExp 
{

    /**
     * @param args
     */
    public static void main(String[] args) {
        String regex = "[A-Z][^\\s.]*\\s[A-Z].*";
        String str = "APzsnnm lmn Dlld";
        System.out.println(str.matches(regex));

    }

}
NeverHopeless
  • 11,077
  • 4
  • 35
  • 56
jsjunkie
  • 559
  • 3
  • 7