11

Hi I have a following strings in my data and would like to replace A1-A9 to A01-A09 and B1-B9 to B01-B09 but keep the numbers >=10.

rep_data=data.frame(Str= c("A1B10", "A2B3", "A11B1", "A5B10"))

    Str
1 A1B10
2  A2B3
3 A11B1
4 A5B10

There is a similar post here but my problem is little bit different! and haven't seen similar example in here str_replace.

Will be very glad if you know the solution.

expected output

Str
1 A01B10
2 A02B03
3 A11B01
4 A05B10
acylam
  • 18,231
  • 5
  • 36
  • 45
Alexander
  • 4,527
  • 5
  • 51
  • 98

7 Answers7

7

I think this should get you what you want:

gsub("(?<![0-9])([0-9])(?![0-9])", "0\\1", rep_data$Str, perl = TRUE)
#[1] "A01B10" "A02B03" "A11B01" "A05B10"

It uses PCRE lookahead's/lookbehind's to match a 1 digit number and then pastes a "0" onto it.

Mike H.
  • 13,960
  • 2
  • 29
  • 39
3

How about something like this

num_pad <- function(x) {
  x <- as.character(x)
  mm <- gregexpr("\\d+|\\D+",x)  
  parts <- regmatches(x, mm)
  pad_number <- function(x) {
    nn<-suppressWarnings(as.numeric(x))
    x[!is.na(nn)] <- sprintf("%02d", nn[!is.na(nn)])
    x
  }
  parts <- lapply(parts, pad_number)
  sapply(parts, paste0, collapse="")
}


num_pad(rep_data$Str)
# [1] "A01B10" "A02B03" "A11B01" "A05B10"

Basically we use regular expressions to split the strings up into digit and non-digit groups. We then find those values that look like numbers and use sprintf() to zero-pad them to 2 characters. Then we insert the padded values into the vector and paste everything back together.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
2

Not checked thoroughly

x = c("A1B10", "A2B3", "A11B1", "A5B10")
sapply(strsplit(x, ""), function(s){
    paste(sapply(split(s, cumsum(s %in% LETTERS)), function(a){
        if(length(a) == 2){
            a[2] = paste0(0, a[2])
        }
        paste(a, collapse = "")
    }), collapse = "")
})
#[1] "A01B10" "A02B03" "A11B01" "A05B10"
d.b
  • 32,245
  • 6
  • 36
  • 77
2

A solution from tidyverse and stringr.

library(tidyverse)
library(stringr)

rep_data2 <- rep_data %>%
  extract(Str, into = c("L1", "N1", "L2", "N2"), regex = "(A)(\\d+)(B)(\\d+)") %>%
  mutate_at(vars(starts_with("N")), funs(str_pad(., width = 2, pad = "0"))) %>%
  unite(Str, everything(), sep = "")
rep_data2
     Str
1 A01B10
2 A02B03
3 A11B01
4 A05B10
www
  • 38,575
  • 12
  • 48
  • 84
2

This is the most concise tidy solution I can come up with:

library(tidyverse)
library(stringr)

rep_data %>%
  mutate(
    num_1 = str_match(Str, "A([0-9]+)")[, 2],
    num_2 = str_match(Str, "B([0-9]+)")[, 2],
    num_1 = str_pad(num_1, width = 2, side = "left", pad = "0"),
    num_2 = str_pad(num_2, width = 2, side = "left", pad = "0"),
    Str = str_c("A", num_1, "B", num_2)
  ) %>%
  select(- num_1, - num_2)
Stijn
  • 96
  • 5
2

A bit similar to @Mike's answer, but this solution uses one positive lookahead:

gsub("(\\D)(?=\\d(\\D|\\b))", "\\10", rep_data$Str, perl = TRUE)
# [1] "A01B10" "A02B03" "A11B01" "A05B10"

With tidyverse:

library(dplyr)
library(stringr)

rep_data %>%
  mutate(Str = str_replace_all(Str, "(\\D)(?=\\d(\\D|\\b))", "\\10"))

#      Str
# 1 A01B10
# 2 A02B03
# 3 A11B01
# 4 A05B10

This regex matches all non-digits that are followed by a digit and either by another non-digit or a word boundary. \\10 is quite deceiving since it looks like it is replacing the match with the 10th capture group. Instead, it replaces the match with the 1st capture group plus a zero right after.

acylam
  • 18,231
  • 5
  • 36
  • 45
1

Here is one option with gsubfn

library(gsubfn)
gsubfn("(\\d+)", ~sprintf("%02d", as.numeric(x)), as.character(rep_data$Str))
#[1] "A01B10" "A02B03" "A11B01" "A05B10"
akrun
  • 874,273
  • 37
  • 540
  • 662