0

I'm trying to come up with code that will extract only the price from a line of text.

Motivated by RegEx for Prices?, I came up with the following command:

gregexpr('\\d+(\\.\\d{1,2})', '23434 34.232 asdf 3.12  ')

[[1]]
[1]  7 19
attr(,"match.length")
[1] 5 4
attr(,"useBytes")
[1] TRUE

However, in my case, I would only like 3.12 to match and not 34.232. Any suggestions?

Community
  • 1
  • 1
andrewj
  • 2,965
  • 8
  • 36
  • 37
  • If you want to extract the value, then its better/straightforward to use `sub` than `gregexpr`. – Arun Feb 18 '13 at 07:43

3 Answers3

3

I think this should work:

'\\d+\\.\\d{1,2}(?!\\d)'
Michael
  • 3,334
  • 20
  • 27
2
\\d+\\.\\d{1,2}(?!\\d)

I'm not 100% sure that negative lookahead is supported in r, so here is an alternative:

\\d+\\.\\d{1,2}(?:[^\\d]|$)
Explosion Pills
  • 188,624
  • 52
  • 326
  • 405
-2

one or more digits followed by a point, followed by 1 or 2 digits, followed by white space or end of string

\\d+\\.\\d{1,2}(\w|$)

Edit: as per comments, R uses double-escape

AndrewP
  • 1,598
  • 13
  • 24
  • Won't work. You didn't escape the period. You're also either capturing words after the end of the price or assuming that the price is at the end of the string, both of which the OP probably doesn't want. . – Michael Feb 18 '13 at 03:22
  • Right you are about the escaping of period. accidentally un-escaped it... wrapping the entire thing in parenthesis and putting it as part of a bigger statement will allow Marius to just select the matching token – AndrewP Feb 18 '13 at 03:28
  • 1
    Note the `r` tag. That's why all the other answers have double escapes: \\ . – Matthew Lundberg Feb 18 '13 at 03:41
  • @Matthew this is the difficulty in tagging regex within R. regex in a way is its own thing and those unfamiliar with R assume regex is regex. I'm often hesitant about using the regex tag in R (on SO) for this reason. – Tyler Rinker Feb 18 '13 at 04:31
  • Noted, thanks for alerting me to paying more attention to Tag descriptions :) – AndrewP Feb 18 '13 at 04:46