154

How can I extract a substring from within a string in Ruby?

Example:

String1 = "<name> <substring>"

I want to extract substring from String1 (i.e. everything within the last occurrence of < and >).

Alan W. Smith
  • 24,647
  • 4
  • 70
  • 96
Madhusudhan
  • 8,374
  • 12
  • 47
  • 68

5 Answers5

358
"<name> <substring>"[/.*<([^>]*)/,1]
=> "substring"

No need to use scan, if we need only one result.
No need to use Python's match, when we have Ruby's String[regexp,#].

See: http://ruby-doc.org/core/String.html#method-i-5B-5D

Note: str[regexp, capture] → new_str or nil

Nakilon
  • 34,866
  • 14
  • 107
  • 142
  • 40
    No need to discredit other perfectly valid (and might I opine, more readable) solutions. – coreyward Nov 06 '10 at 21:07
  • 44
    @coreyward, if they are better, please, argument it. For example, sepp2k's solution is more flexible, and that's why I pointed `if we need only one result` in my solution. And `match()[]` is slower, because it's two methods instead of one. – Nakilon Nov 06 '10 at 21:10
  • 6
    This is the fastest of all the methods presented, but even the slowest method takes only 4.5 microseconds on my machine. I do not care to speculate why this method is faster. In performance, speculation is _useless_. Only measurement counts. – Wayne Conrad Nov 07 '10 at 07:32
  • 9
    I find this solution more straightforward and to the point (since I am new to Ruby). Thanks. – Ryan H. Jun 30 '11 at 10:46
  • @Nakilon Readability can outweigh tiny performance differences when considering the overall success of a product and team, so coreyward made a valid comment. That said, I think `string[regex]` can be just as readable in this scenario, so that's what I used personally. – Nick Feb 09 '17 at 21:01
  • Do you mind adding since what ruby version this method is valid? I want to use this solution but now I need to go lookup what version or buy introduced string[regex] (for all I know it's always been there). Edit: I can only find ruby api docs > 2.0.0 where this method is still valid so it's probably fine to use it: https://ruby-doc.org/core-2.0.0/String.html#method-i-5B-5D – Asaf Aug 09 '17 at 16:54
  • @Asaf, if you ask me, then take a look at answer timestamp -- this code was valid in November 2010. – Nakilon Aug 10 '17 at 11:31
  • @Nakilon It's not something I noticed before, but of course it must be valid for at least that long. Thanks for the answer to this question which helped me solve my problem. – Asaf Aug 10 '17 at 22:29
150
String1.scan(/<([^>]*)>/).last.first

scan creates an array which, for each <item> in String1 contains the text between the < and the > in a one-element array (because when used with a regex containing capturing groups, scan creates an array containing the captures for each match). last gives you the last of those arrays and first then gives you the string in it.

sepp2k
  • 363,768
  • 54
  • 674
  • 675
26

You can use a regular expression for that pretty easily…

Allowing spaces around the word (but not keeping them):

str.match(/< ?([^>]+) ?>\Z/)[1]

Or without the spaces allowed:

str.match(/<([^>]+)>\Z/)[1]
Sergio Tulentsev
  • 226,338
  • 43
  • 373
  • 367
coreyward
  • 77,547
  • 20
  • 137
  • 166
  • 1
    I'm not sure that the last `<>` actually needs to be the last thing in the string. If e.g. the string `foo baz` is allowed (and supposed to give the result `bar`), this will not work. – sepp2k Nov 06 '10 at 21:03
  • I just went based on the sample string he provided. – coreyward Nov 06 '10 at 21:06
12

Here's a slightly more flexible approach using the match method. With this, you can extract more than one string:

s = "<ants> <pants>"
matchdata = s.match(/<([^>]*)> <([^>]*)>/)

# Use 'captures' to get an array of the captures
matchdata.captures   # ["ants","pants"]

# Or use raw indices
matchdata[0]   # whole regex match: "<ants> <pants>"
matchdata[1]   # first capture: "ants"
matchdata[2]   # second capture: "pants"
Grant Birchmeier
  • 17,809
  • 11
  • 63
  • 98
7

A simpler scan would be:

String1.scan(/<(\S+)>/).last
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Navid
  • 71
  • 1
  • 3