2

I had an original dataset that looks like this.:

> df.1
     id score
1  13_B     1
2  13_C     4
3 133_D     5
4   141     2
5   145     3
6   143     4
7  12_B     6
8  12_C     7
9  12_D     9

I needed to do some process that needs all the ids numeric therefore I recoded _B|_C|_D into 1|2|3.

After I finished some processed on the dataset having an extra group column, Here is how my sample dataset looks like:

df.2 <- data.frame(id = c("131","132","133", "141", "145", "143", "121","122","123"),
                   score = c(1,4,5,2,3,4,6,7,9),
                   group = c(5,5,5,4,4,4,3,3,3))
    
> df.2
   id score group
1 131     1     5
2 132     4     5
3 133     5     5
4 141     2     4
5 145     3     4
6 143     4     4
7 121     6     3
8 122     7     3
9 123     9     3

At this point, I need to convert the ids back to the original for those items = c(12,13,15). So 15 is not in this dataset but need something that works globally. My desired output is:

> df.3
     id score group
1  13_B     1     5
2  13_C     4     5
3  13_D     5     5
4   141     2     4
5   145     3     4
6   143     4     4
7  12_B     6     3
8  12_C     7     3
9  12_D     9     3

Any ideas?

Thanks!

amisos55
  • 1,913
  • 1
  • 10
  • 21
  • How did you get the `group` values – akrun Oct 14 '21 at 19:07
  • Is the third element `133_D` or `13_D` – akrun Oct 14 '21 at 19:08
  • is it possible to keep the original colunm 'id' unchanged and then create a new column 'id_numeric'? Then you can do your filters and whatnot using the id of choice – Jagge Oct 14 '21 at 19:11
  • @akrun corrected that typo. `group` variable came after I did some processes on the dataset. nothing important. – amisos55 Oct 14 '21 at 19:14
  • FYI, you cannot mix `character` and `numeric` values in the same column. So you can either **(1)** convert the _entire_ `id` column to `character`; or **(2)** define a _second_ `id` column (`id_processed`?) to allow separate `character` and `numeric` datatypes, as suggested [here](https://stackoverflow.com/posts/comments/122978553) by [@Jagge](https://stackoverflow.com/users/9025715/jagge). – Greg Oct 14 '21 at 19:14
  • @Jagge, the software I used only needs one id column and calculates new columns. So I could keep an `id_numeric` column. – amisos55 Oct 14 '21 at 19:15
  • @amisos55 your input and output is confusing. If you want to recode the 'id' to a new column, the solution below works – akrun Oct 14 '21 at 19:19
  • @akrun, sorry for the confusion. My sample dataset is `df2` and `desired dataset is `df3`. for those `items = c(12,13,15)`. – amisos55 Oct 14 '21 at 19:22
  • 1
    @amisos55 can you check the update solution – akrun Oct 14 '21 at 19:27

2 Answers2

2

Use str_replace_all to recode the substring replacement by passing a named vector (setNames)

library(dplyr)
library(stringr)
df.1 %>% 
   mutate(id1 = as.numeric(str_replace_all(str_replace(id, "^(\\d{2})\\d+_(.*)", 
       "\\1_\\2"),  setNames(as.character(c(1, 2, 3)), c("_B", "_C", "_D")))))

-output

   id   score   id1
1  13_B     1 131
2  13_C     4 132
3 133_D     5 133
4   141     2 141
5   145     3 145
6   143     4 143
7  12_B     6 121
8  12_C     7 122
9  12_D     9 123

For replacing from 'df.2'

df.2 %>% 
   mutate(id2 = case_when(substr(id, 1, 2) %in% c(12, 13, 15) ~ 
    str_replace_all(as.character(id), setNames(c("_B", "_C", "_D"),
          str_c(1:3, "$"))), TRUE ~as.character(id)))

-output

   id score group  id2
1 131     1     5 13_B
2 132     4     5 13_C
3 133     5     5 13_D
4 141     2     4  141
5 145     3     4  145
6 143     4     4  143
7 121     6     3 12_B
8 122     7     3 12_C
9 123     9     3 12_D

data

df.1 <- structure(list(id = c("13_B", "13_C", "133_D", "141", "145", 
"143", "12_B", "12_C", "12_D"), score = c(1L, 4L, 5L, 2L, 3L, 
4L, 6L, 7L, 9L)), row.names = c("1", "2", "3", "4", "5", "6", 
"7", "8", "9"), class = "data.frame")
akrun
  • 874,273
  • 37
  • 540
  • 662
1

You may try this:

df.2 %>% 
  group_by(group) %>% 
  mutate(group_id=row_number(),
         x= paste0("_", LETTERS[2:4])) %>% 
  mutate(id2 = ifelse(!str_detect(id,"14"), paste0(str_sub(id,1,2),x),id)) %>% 
  select(id, id2, score, group)
  id    id2   score group
  <chr> <chr> <dbl> <dbl>
1 131   13_B      1     5
2 132   13_C      4     5
3 133   13_D      5     5
4 141   141       2     4
5 145   145       3     4
6 143   143       4     4
7 121   12_B      6     3
8 122   12_C      7     3
9 123   12_D      9     3
TarJae
  • 72,363
  • 6
  • 19
  • 66