11

I am trying to parse a multi line string and get the rest of the line following a pattern.

text:

hello john
your username is: jj
thanks for signing up

I want to extract jj, aka everything after "your username is: "

One way:

text = "hello john\nyour username is: jj\nthanks for signing up\n"
match = text[/your username is: (.*)/]
value = $1

But this reminds me of perl... and doesn't "read" as naturally as I am told ruby should.

Is there a cleaner way? AKA A "ruby" way?

Thanks

Michael Myers
  • 188,989
  • 46
  • 291
  • 292
SWR
  • 667
  • 2
  • 6
  • 12
  • 1
    Ruby was actually fairly heavily inspired by Perl; it was supposed to fit in the same niche as Perl, but with a nice object system and syntax that was cleaner and more regular. It actually has a number of features taken straight from Perl, like ruby -p -i -e, though some of the more Perl like features are deprecated now. – Brian Campbell May 22 '09 at 21:27

4 Answers4

22

Your code is pretty much the Ruby way. If you don't want to use the global $1, you can use the 2 arg version String#[]:

match = text[/your username is: (.*)/, 1]
outis
  • 75,655
  • 22
  • 151
  • 221
  • Thanks. That is exactly what I was looking for! Pulling up dollar globals just seemed too old school. Went and read the API docs for Ruby, Class:String and there it was. http://www.ruby-doc.org/core/classes/String.html#M000786 – SWR May 22 '09 at 21:22
6

The split command is mindbogglingly useful. It divides a string into an array of substrings, separating on whatever you pass in. If you don't give it any arguments, it splits on whitespace. So if you know the word you're looking for is the fifth "word" (splitting on both spaces and the return character), you can do this:

text = "hello john\nyour username is: jj\nthanks for signing up\n"
match=text.split[5]

..but perhaps that's not sufficiently self-documenting, or you want to allow for multi-word matches. You could do this instead:

midline=text.split("\n")[1]
match=midline.split("username is: ").last

Or perhaps this more terse way:

match=text[/username is: (.*)/,1]

glenra
  • 1,870
  • 2
  • 11
  • 11
  • i would do that, but split on " " instead of "your username is: " – Matt Briggs May 22 '09 at 21:20
  • 1
    +1 for an interesting use of split to get the fifth word, I hadn't noticed that. – Robert K May 22 '09 at 21:26
  • 2
    Wow, thanks for the multiple attack paths. ;) I hadn't noticed the split possibility, but the incoming text isn't static (the example was greatly simplified) so it won't work... but it was a nice way to attack the problem. – SWR May 22 '09 at 21:31
  • 1
    Matt: The only reason to split on something like "username is: " or "your username is: " is to make it a little more obvious what the code is doing. It says to the later maintenance programmer "oh, this is getting the 'username is: ' text" rather than "this is getting the last word from some line (of unknown provenance)". Though I suppose one could accomplish the same goal by naming the variables somehing better than "match" and "midline"... – glenra May 22 '09 at 21:32
4

Not sure if it's any more Ruby'ish, but another option:

>> text = "hello john\nyour username is: jj\nthanks for signing up\n"
>> text.match(/your username is: (.*)/)[1]
=> "jj"
dbr
  • 165,801
  • 69
  • 278
  • 343
  • 2
    This is how I'd do it. It's not technically superior to the [] solution, but I think the intent is clearer when you're skimming it. – Chuck May 22 '09 at 21:36
3

There's also Regexp#match, which returns a MatchData object, which has all the information you could possibly want.

irb> match = /your username is: (.*)/.match "hello john\nyour username is: jj\nthanks for signing up\n"
#=> #<MatchData:0x557f94>
irb> match.pre_match
#=> "hello john\n"
irb> match.post_match
#=> "\nthanks for signing up\n"
irb> match[0]
#=> "your username is: jj"
irb> match[1]
#=> "jj"
rampion
  • 87,131
  • 49
  • 199
  • 315