str_replace A1-A9 to A01-A09 and so on

Question

Hi I have a following strings in my data and would like to replace A1-A9 to A01-A09 and B1-B9 to B01-B09 but keep the numbers >=10.

rep_data=data.frame(Str= c("A1B10", "A2B3", "A11B1", "A5B10"))

    Str
1 A1B10
2  A2B3
3 A11B1
4 A5B10

There is a similar post here but my problem is little bit different! and haven't seen similar example in here str_replace.

Will be very glad if you know the solution.

expected output

Str
1 A01B10
2 A02B03
3 A11B01
4 A05B10

Is that an important requirement to use *tidyverse*? – Wiktor Stribiżew Oct 24 '17 at 19:56 — Wiktor Stribiżew, Oct 24 '17 at 19:56

Mike H. · Accepted Answer · 2017-10-24T20:17:08.803

7

I think this should get you what you want:

gsub("(?<![0-9])([0-9])(?![0-9])", "0\\1", rep_data$Str, perl = TRUE)
#[1] "A01B10" "A02B03" "A11B01" "A05B10"

It uses PCRE lookahead's/lookbehind's to match a 1 digit number and then pastes a "0" onto it.

edited Oct 24 '17 at 20:17

answered Oct 24 '17 at 20:14

Mike H.

13,960
2
29
39

what is the middle ([0-9]) for? – Alexander Oct 24 '17 at 20:19
1

It matches a digit. The ones around the middle `([0-9])` match non-digits – Mike H. Oct 24 '17 at 20:20

score 3 · Answer 2 · answered Oct 24 '17 at 20:14

How about something like this

num_pad <- function(x) {
  x <- as.character(x)
  mm <- gregexpr("\\d+|\\D+",x)  
  parts <- regmatches(x, mm)
  pad_number <- function(x) {
    nn<-suppressWarnings(as.numeric(x))
    x[!is.na(nn)] <- sprintf("%02d", nn[!is.na(nn)])
    x
  }
  parts <- lapply(parts, pad_number)
  sapply(parts, paste0, collapse="")
}


num_pad(rep_data$Str)
# [1] "A01B10" "A02B03" "A11B01" "A05B10"

Basically we use regular expressions to split the strings up into digit and non-digit groups. We then find those values that look like numbers and use sprintf() to zero-pad them to 2 characters. Then we insert the padded values into the vector and paste everything back together.

score 2 · Answer 3 · answered Oct 24 '17 at 20:24

Not checked thoroughly

x = c("A1B10", "A2B3", "A11B1", "A5B10")
sapply(strsplit(x, ""), function(s){
    paste(sapply(split(s, cumsum(s %in% LETTERS)), function(a){
        if(length(a) == 2){
            a[2] = paste0(0, a[2])
        }
        paste(a, collapse = "")
    }), collapse = "")
})
#[1] "A01B10" "A02B03" "A11B01" "A05B10"

score 2 · Answer 4 · answered Oct 24 '17 at 20:32

A solution from tidyverse and stringr.

library(tidyverse)
library(stringr)

rep_data2 <- rep_data %>%
  extract(Str, into = c("L1", "N1", "L2", "N2"), regex = "(A)(\\d+)(B)(\\d+)") %>%
  mutate_at(vars(starts_with("N")), funs(str_pad(., width = 2, pad = "0"))) %>%
  unite(Str, everything(), sep = "")
rep_data2
     Str
1 A01B10
2 A02B03
3 A11B01
4 A05B10

score 2 · Answer 5 · answered Oct 24 '17 at 21:14

This is the most concise tidy solution I can come up with:

library(tidyverse)
library(stringr)

rep_data %>%
  mutate(
    num_1 = str_match(Str, "A([0-9]+)")[, 2],
    num_2 = str_match(Str, "B([0-9]+)")[, 2],
    num_1 = str_pad(num_1, width = 2, side = "left", pad = "0"),
    num_2 = str_pad(num_2, width = 2, side = "left", pad = "0"),
    Str = str_c("A", num_1, "B", num_2)
  ) %>%
  select(- num_1, - num_2)

score 2 · Answer 6 · answered Oct 25 '17 at 04:32

A bit similar to @Mike's answer, but this solution uses one positive lookahead:

gsub("(\\D)(?=\\d(\\D|\\b))", "\\10", rep_data$Str, perl = TRUE)
# [1] "A01B10" "A02B03" "A11B01" "A05B10"

With tidyverse:

library(dplyr)
library(stringr)

rep_data %>%
  mutate(Str = str_replace_all(Str, "(\\D)(?=\\d(\\D|\\b))", "\\10"))

#      Str
# 1 A01B10
# 2 A02B03
# 3 A11B01
# 4 A05B10

This regex matches all non-digits that are followed by a digit and either by another non-digit or a word boundary. \\10 is quite deceiving since it looks like it is replacing the match with the 10th capture group. Instead, it replaces the match with the 1st capture group plus a zero right after.

score 1 · Answer 7 · answered Oct 25 '17 at 02:27

1

Here is one option with gsubfn

library(gsubfn)
gsubfn("(\\d+)", ~sprintf("%02d", as.numeric(x)), as.character(rep_data$Str))
#[1] "A01B10" "A02B03" "A11B01" "A05B10"

answered Oct 25 '17 at 02:27

akrun

874,273
37
540
662

str_replace A1-A9 to A01-A09 and so on

7 Answers7