0

Is there any way where I can display all characters and strings except a certain set of words or range of numbers, using Java? For example:

Regex:

^(if|else),[0-9] 

Input String:

if x <= 7

Output:

Unrecognized Tokens:

x , <=

Since "if" and 7 are negated, they won't appear. Can I negate set of strings and range of numbers all in a single regular expression? Or is there any other way that I can just display the unmatched strings? Our assignment is to display the recognized tokens, then the unmatched tokens. I've researched and studied regex for three days, but I still can't get my assignment done. Thanks in advance for any help.

Perception
  • 79,279
  • 19
  • 185
  • 195
kyoshiro
  • 1
  • 1
  • I think you can't negate strings with a regex, at least in Java. You have to write a regex that matches anything else. For example, to match any string but `stack` you would need `[^s].*|s[^t].*|st[^a].*|sta[^c].*|stac[^k].*|stack.+` (stackXXX would be allowed). Imagine a regex to match anything but several words... a mess. – aalku Aug 30 '11 at 17:40
  • BTW you are talking about tokens but you didn't tell how they are defined. Do you mean tokens are the result of splitting the string by white spaces? So, do you want to test each token by itself? – aalku Aug 30 '11 at 17:44

3 Answers3

1

I would use String.split() to split the string into tokens, then compare each of the tokens to your "filter list".

Even if you could accomplish this using a regex, it would be much less straightforward imo.

Edit:

In fact, you may not even need to iterate through the results. You could potentially split on the "filter" words. For example:

String[] results = s.split(" *if *| *else *| *[0-9]+ *| +");

Note that you'd have to put a blank space in the expression as I assume you don't wantx <= to be a single token in the results. Also, adding whitespace around the keywords will ensure that you don't end up with empty strings in your result set.

Mansoor Siddiqui
  • 20,853
  • 10
  • 48
  • 67
1

Depending on the complexity of the problem, you could try negative lookahead assertions:

\b((?!if|else|\d)\w+)\b

or some crazy combination of lookbehind and negative lookahead:

((?<=\A|\s)(?!if|else|\d)\S+)
brianary
  • 8,996
  • 2
  • 35
  • 29
0

yup, I have to classify the lexemes(if, else, main) as tokens(keyword) 0-9 as NUM, so on...

\b((?!if|else|\d)\w+)\ oops, my bad, it does work. i've accidentally deleted | .

kyoshiro
  • 1
  • 1