1

Just a note upfront: I'm a bit of a regex newbie. Perhaps a good answer to this question would involve linking me to a resource that explains how these sorts of conditions work :)

Lets say that I have a street name, like 23rd St or 5th St. I'd like to get rid of the proceeding "th", "rd", "nd", and "st". How can this be done?

Right now I have the expression: (st|nd|rd|th) . The problem with this is that it will also match street names that contain a "st", "nd", "rd", or "th". So what I really need is a conditional match that looks for a minimum of one number before itself (ie; 1st and not street).

Thank you!

Matt
  • 22,721
  • 17
  • 71
  • 112
Eric R.
  • 933
  • 1
  • 9
  • 19

4 Answers4

5

It sounds like you just want to match the ordinal suffix (st|nd|rd|th), yes?

If your regex engine supports it, you could use a lookbehind assertion.

/(?<=\d)(st|nd|rd|th)/

That matches (st|nd|rd|th) only if preceded by a digit \d, but the match does not capture the digit itself.

Wiseguy
  • 20,522
  • 8
  • 65
  • 81
  • Problem: it will match `azoiu32rdzeriuoiu` – fge Dec 29 '11 at 21:12
  • @fge True. To prevent that, do you suppose it's safe to assume that it's preceded by a space then only digits (e.g., `(?<= \d+)`)? I hate ever making assumptions... – Wiseguy Dec 29 '11 at 21:15
  • @fge That is not a problem. He said he wants to match only the `st|nd|rd|th` if there are numbers before it. That's what this does assuming that lookbehinds are supported in the regex engine he's using. Is there really a street that has numbers in the name with letters before and after? – Paul Dec 29 '11 at 21:16
  • @user1 no, the OP says he wants to _get rid_ of the suffix – fge Dec 29 '11 at 21:19
  • And? How is the regex above supposed to help? Hint: You cannot replace a captured group by nothing. – fge Dec 29 '11 at 21:21
  • I was under the impression he was matching the suffix to replace what was matched. Not knowing what tool/language is being used for that, I can only answer the given question. – Wiseguy Dec 29 '11 at 21:21
  • 1
    @fge Yes you can; replace all matches of the above regex with the empty string: `""`. – Paul Dec 29 '11 at 21:27
  • This works! The only issue now is that given a street name of "Himrod Street", the ruby regex engine (using gsub!) is giving me back an empty string. Any idea why? – Eric R. Dec 29 '11 at 21:29
  • @Wiseguy well, there is the huge problem that arbitrary length lookbehinds are only supported by .NET languages... – fge Dec 29 '11 at 21:31
  • @EricR Are you trying to match the whole address, or are you trying to match just the suffix? I was assuming the latter, where you would replace the match with an empty string. No match would be found in "Himrod Street". – Wiseguy Dec 29 '11 at 21:33
  • @MihaiClaudiuToader see the input string I gave as an example, which does match, and see my solution, which won't match it – fge Dec 29 '11 at 21:33
  • Right, so in a no match case why is Ruby's gsub! method making the entire string blank? – Eric R. Dec 29 '11 at 21:37
  • @Wiseguy no, not arbitrary length lookbehinds. _Fixed_ length lookbehinds, yes, ie, regexes which have an upper bound limit to the matched characters. – fge Dec 29 '11 at 21:38
  • @EricR Ah, I don't know Ruby. Upon a quick search, I see that [`gsub!`](http://ruby-doc.org/docs/ProgrammingRuby/html/ref_c_string.html#String.gsub_oh) returns `nil` if nothing is matched. Would [`gsub`](http://ruby-doc.org/docs/ProgrammingRuby/html/ref_c_string.html#String.gsub) instead of `gsub!` work? – Wiseguy Dec 29 '11 at 21:40
  • @Wiseguy gsub did it! Thank you very much! – Eric R. Dec 29 '11 at 21:40
  • @fge Ah, interesting. I did not know that. Thanks for sharing. – Wiseguy Dec 29 '11 at 21:42
  • @fge That's actually easily fixable (just match a lookahead assertions for \b or \s+ at the end of the regex). The bump was for properly using the look{ahead,behind} assertions. Assertions allows one to match a regex depending on contextual information which is actually important in certain cases. One can of course do it like you did but that isn't necessarily better. – Mihai Toader Dec 31 '11 at 10:31
2

What you really want are anchors.

Try and replace globally:

\b(\d+)(?:st|nd|rd|th)\b

with the first group.

Explanation:

  • \b --> matches a position where either a word character (digit, letter, underscore) is followed by a non word character (none of the previous group), or the reverse;
  • (\d+) --> matches one or more digits, and capture them in first group ($1);
  • (?:st|nd|rd|th) --> matches any of st, etc... wihtout capturing it ((?:...) is a non capturing group);
  • \b --> see above.

Demonstration using perl:

$ perl -pe 's/\b(\d+)(?:st|nd|rd|th)\b/$1/g' <<EOF
> Mark, 23rd street, New Hampshire
> I live on the 7th avenue
> No match here...
> azoiu32rdzeriuoiu
> EOF
Mark, 23 street, New Hampshire
I live on the 7 avenue
No match here...
azoiu32rdzeriuoiu
fge
  • 119,121
  • 33
  • 254
  • 329
  • This will fail on the beginning of the string, and won't work in about half the languages which don't implement lookbehind. Instead of the space lookbehind, why not use another boundary anchor `\b`? – Amadan Dec 29 '11 at 21:15
  • I was about to edit the solution to include \b instead but got distracted :p Editing... – fge Dec 29 '11 at 21:17
  • This is removing the entire street name (ie; 4th). – Eric R. Dec 29 '11 at 21:33
  • Not if you substitute with the first group as instructed! – fge Dec 29 '11 at 21:39
1

Try using this regex:

(\d+)(?:st|nd|rd|th)

I don't know ruby. In PHP I would use something like:

preg_replace('/(\d+)(?:st|nd|rd|th) /', '$1', 'South 2nd Street');

to remove suffix

piotrekkr
  • 2,785
  • 2
  • 21
  • 35
0

To remove the ordinal:

 /(\d+)(?:st|nd|rd|th)\b/$1/

You must capture the number so you can replace the match with it. You can capture the ordinal or not, it doesn't matter unless you want to output it somewhere else.

http://www.regular-expressions.info/javascriptexample.html

SpacedMonkey
  • 2,725
  • 1
  • 16
  • 17