7

I want to remove second and subsequent occurrences of decimal point in string. My attempt is below:

library(stringr)
str_remove(string = "3.99-0.13", pattern = "\\.")
[1] "399-0.13"
sub("\\.", "", "3.99-0.13")
[1] "399-0.13"

However, I want the output like 3.99-013. Any hint, please.

user438383
  • 5,716
  • 8
  • 28
  • 43
MYaseen208
  • 22,666
  • 37
  • 165
  • 309

3 Answers3

3

A concise solution is based on look-behind:

library(stringr)
str_remove_all(x, "(?<=\\..{0,10})\\.")
[1] "3.99-01322" "3.99013"

Here, the look-behind (?<=\\..{0,10}) asserts that a . must already have occurred before the str_remove_all operation is carried out.

Data:

x <- c("3.99-0.13.2.2", "3.990.13")
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
1

An approach with sub and gsub with simple regex patterns that works on a variety of possible inputs.

Extract the first part, then remove all dots from the second part, finally paste the two together.

Data

stri <- c("3.99-0.13", "393.99.0.13.0.0", ".832.723.723", "3.Ud.2349_3.", 
"D.235.2")

stri
[1] "3.99-0.13"       "393.99.0.13.0.0" ".832.723.723"    "3.Ud.2349_3."   
[5] "D.235.2"

apply

paste0(sub("\\..*", ".", stri), gsub("\\.", "", sub(".*?\\.", "", stri)))
[1] "3.99-013"    "393.9901300" ".832723723"  "3.Ud2349_3"  "D.2352" 
Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29
1

We could locate all occurrences of .'s, exclude the first, and remove the rest:

library(stringi)

x <- c("3.99-0.13.2.2", "3.990.13")

stri_sub_replace_all(x,
                     lapply(stri_locate_all_regex(x, '\\.'), \(x) x[-1,, drop = FALSE]),
                     replacement = "")

Output:

"3.99-01322" "3.99013"

Note: stringr's str_sub doesn't seem to have a replace_all option, so we'll need stringi for this (that said, stringr::str_locate_all could be used instead of stri_locate_all_regex if you prefer).

Update: Now works with <= 2 occurrences.

harre
  • 7,081
  • 2
  • 16
  • 28
  • Not working correctly for `x2 <- "3.990.13" stri_sub_replace_all( x2 , stri_locate_all_regex(x2, '\\.')[[1]][-1,] , replacement = "" )` – MYaseen208 Jan 18 '23 at 14:12
  • See update. The subsetting turned the locations into a named vector when there is only one occurrence of `.` left after removing the first. – harre Jan 18 '23 at 14:22