16

A question came across talkstats.com today in which the poster wanted to remove the last period of a string using regex (not strsplit). I made an attempt to do this but was unsuccessful.

N <- c("59.22.07", "58.01.32", "57.26.49")

#my attempts:
gsub("(!?\\.)", "", N)
gsub("([\\.]?!)", "", N)

How could we remove the last period in the string to get:

[1] "59.2207" "58.0132" "57.2649"
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
  • You can do it with `gsub` if you include the group at the end. `gsub('\\.([0-9]+)$', '\\1', N)` – Justin Jan 25 '13 at 20:04
  • 2
    your first function almost works great with `sub`. – Justin Jan 25 '13 at 20:07
  • Thank you Justin but it's on the wrong end:) – Tyler Rinker Jan 25 '13 at 20:13
  • possible duplicate of [Regular expression to remove a file's extension](http://stackoverflow.com/questions/1818310/regular-expression-to-remove-a-files-extension) – Bohemian Jan 25 '13 at 20:23
  • 3
    @Bohemian People who are extremely good at regex routinely have a much more liberal definition of what qualifies as a duplicate than I do. I wouldn't have the foggiest notion of how to use the answers at that question to solve this problem in R. – joran Jan 25 '13 at 20:41

4 Answers4

24

Maybe this reads a little better:

gsub("(.*)\\.(.*)", "\\1\\2", N)
[1] "59.2207" "58.0132" "57.2649"

Because it is greedy, the first (.*) will match everything up to the last . and store it in \\1. The second (.*) will match everything after the last . and store it in \\2.

It is a general answer in the sense you can replace the \\. with any character of your choice to remove the last occurence of that character. It is only one replacement to do!

You can even do:

gsub("(.*)\\.", "\\1", N)
Dave Jarvis
  • 30,436
  • 41
  • 178
  • 315
flodel
  • 87,577
  • 21
  • 185
  • 223
  • Great explanantion flodel. Thank you for the response. I checked Rohit Jain because I think his response is more generalizable to more varied situations. For this particular situation your response is spot on +1 – Tyler Rinker Jan 25 '13 at 20:22
15

You need this regex: -

[.](?=[^.]*$)

And replace it with empty string.

So, it should be like: -

gsub("[.](?=[^.]*$)","",N,perl = TRUE)

Explanation: -

[.]         // Match a dot
(?=         // Followed by
    [^.]    // Any character that is not a dot.
     *      // with 0 or more repetition
     $      // Till the end. So, there should not be any dot after the dot we match.
)  

So, as soon as a dot(.) is matched in the look-ahead, the match is failed, because, there is a dot somewhere after the current dot, the pattern is matching.

Rohit Jain
  • 209,639
  • 45
  • 409
  • 525
  • 1
    @Arun.. Sorry, a little misplacement of `$` was there. Fixed it now. – Rohit Jain Jan 25 '13 at 20:08
  • Once you add `perl=TRUE` that gives `[1] "59.2207" "58.0132" "57.2649"` rather than the requested result. – Justin Jan 25 '13 at 20:08
  • 1
    @Justin. Well, isn't that what is requested? I thought, OP wanted to replace just the last `period` and not the complete string following it. – Rohit Jain Jan 25 '13 at 20:10
  • @Justin.. I knew that. Cheers :) – Rohit Jain Jan 25 '13 at 20:11
  • 1
    @Arun.. I don't know about those functions and the language `r`, so I can't comment. I just posted a valid regex, that would work with your function. – Rohit Jain Jan 25 '13 at 20:15
  • 1
    I intentionally made the title say character instead of period to make this question more generalizable to others. I believe this response is the most generalizable to multiple situations. +1 – Tyler Rinker Jan 25 '13 at 20:20
  • 1
    @TylerRinker: Both solutions should be equivalent. There is a small different with `.` any character **but new line** being used in flodel's solution, but it can be fixed easily. – nhahtdh Jan 25 '13 at 20:31
  • @nhahtdh.. Thanks for pointing out the concept of `Regex-directed engines`. Didn't knew that earlier. :) – Rohit Jain Jan 25 '13 at 20:34
  • 1
    @Arun.. Sure. Will add that. – Rohit Jain Jan 25 '13 at 20:48
  • @nhahtdh I'm pretty poor with regex but this solution could be used to get rid of the last zero in the string using `gsub("[0](?=[^0]*$)", "", N , perl = TRUE)` can flodel's be extended to do the same? Ah yes it can. `gsub("(.*)0", "\\1", N)` Moving the check to flodel as that is simpler. – Tyler Rinker Jan 25 '13 at 21:35
  • @RohitJain: I think I removed that comment since it is not really relevant. Well, it's OK if you learn something... – nhahtdh Jan 26 '13 at 06:31
  • @nhahtdh.. Yeah, sure. It's fine :) – Rohit Jain Jan 26 '13 at 06:45
6

I'm sure you know this by now since you use stringi in your packages, but you can simply do

N <- c("59.22.07", "58.01.32", "57.26.49")

stringi::stri_replace_last_fixed(N, ".", "")
# [1] "59.2207" "58.0132" "57.2649"
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
2

I'm pretty lazy with my regex, but this works:

gsub("(*)(.)([0-9]+$)","\\1\\3",N)

I tend to take the opposite approach from the standard. Instead of replacing the '.' with a zero-length string, I just parse the two pieces that are on either side.

Dinre
  • 4,196
  • 17
  • 26