How do I split a token from the end of my string?

Question

I want to separate a string into two parts if a token from an array is found at the end of the string. I have tried this:

x = "Canton Female"
GENDER_TOKENS = ["m", "male", "men", "f", "w", "female", "wom"]

x.partition(/(^|[[:space:]]+)[#{Regexp.union(GENDER_TOKENS)}]$/i)
 #=> ["Canton Female", "", ""]

But although the word "female" is part of my tokens, it is not getting split out. How do I adjust my regex so that it gets split properly?

You are making the same mistake: you use `Regexp.union` inside a regex literal and the `i` is not affecting these alternations. Also, you put this group into a character class, and it ruins the pattern altogether. Not sure what you need here, see [this demo](https://ideone.com/jCz5le), try `x.partition(/(?:^|[[:space:]]+)(?:#{Regexp.union(GENDER_TOKENS).source})$/i)` — Wiktor Stribiżew, Dec 21 '17 at 18:13

Tom Lord · Accepted Answer · 2017-12-21T18:27:10.353

I'm a little unclear what you are asking - what is the desired result? However, here's what I think you're looking for:

GENDER_TOKENS = ["m", "male", "men", "f", "w", "female", "wom"]

"Canton Female".split(/\b(#{Regexp.union(GENDER_TOKENS).source})$/i)
#=> => ["Canton ", "Female"]

"Tom Lord".split(/\b(#{Regexp.union(GENDER_TOKENS).source})$/i)
#=> => ["Tom Lord"]

String#split will split the string on each match; unlike String#partition, which returns [head, match, tail]. I think that's probably what you wanted?
\b is a word boundary anchor. This is a cleaner solution than trying to match on "start of line or whitespace".
The Regexp union is wrapped in round brackets to group the values together, not square brackets. The latter makes it a character set, which is clearly not what you wanted.
Regexp#source returns only the inner "text" of the regexp; unlike the (implicit) Regexp#to_s you were using, which returns the full object including option toggles - i.e. /(?-mix:m|male|men|f|w|female|wom)/

Worth noting the original example had the `Regexp.union` part within `[...]` brackets (set of characters) which makes it behave completely differently. — tadman, Dec 21 '17 at 19:37

score 2 · Answer 2 · answered Dec 21 '17 at 18:25

2

Why not split first?

parts = x.split
if GENDER_TOKENS.include? parts.last.downcase
  # ...
end

Probably not much slower, and way more readable

answered Dec 21 '17 at 18:25

Max

21,123
5
49
71

Cary Swoveland · Answer 3 · 2017-12-21T21:29:54.960

GENDER_TOKENS = %w[m male men f w female wom]
GENDER_REGEX = /\b(?:#{GENDER_TOKENS.join('|')})\z/i
  #=> /\b(?:m|male|men|f|w|female|wom)\z/i

def split_off_token(str)
  idx = str =~ GENDER_REGEX
  case idx
  when nil
    [str]
  when 0
    ['', str]
  else
    [str[0, idx].rstrip, str[idx..-1]]
  end
end

split_off_token("Canton Female")
  #=> ["Canton", "Female"]
split_off_token("Canton M")
  #=> ["Canton", "M"]
split_off_token("wom")
  #=> ["", "wom"]
split_off_token("Canton Fella")
  #=> ["Canton Fella"]

How do I split a token from the end of my string?

3 Answers3