0

I am unsing clear names for my "names" in my measurement data for calculations. Because of data privacy I need to anonymize the solution graphs: I therefore created an assigment table with the "clear" names and random IDs (ids).

   randos clear
 [1,] "d2ef" "01"
 [2,] "6326" "02"
 [3,] "fc31" "03"
 [4,] "02ac" "04"
 [5,] "e43a" "05"
 [6,] "1ac7" "06"

how can I replace the clear by randos in my colnames(). A colname for example is T_01_X. The count and order may be different, because of subset, so I can not set all colnames() just 1:1 with the assignment table. It need to be search the clear string and replace it by the randos. In the End it should the names be e.g.

"01" in "T_01_X" -> "T_d2ef_X"

as well as

"01" in "A_01_Y" -> "A_d2ef_Y"
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Herr Student
  • 853
  • 14
  • 26
  • 1
    Read about `match`. – zx8754 Aug 01 '22 at 10:01
  • 1
    @zx8754 not sure you even need `which` here - just `dat[] <- lapply(dat, \(col) lookup$randos[match(col, lookup$clear)])` or similar. – SamR Aug 01 '22 at 10:03
  • 2
    @SamR no need for recreating the dataframe with lapply, see linked post, just lookup using match. – zx8754 Aug 01 '22 at 10:10
  • Thanks, it's part of the answer, what you pin, I just need to replace a string within the colname -> e.g. replace "01" in "T_01_X" -> "T_d2ef_X". Thanks – Herr Student Aug 01 '22 at 10:23
  • 1
    If that's what you're asking then it's different to the duplicate. I (mis?) understood the question differently to @zx8754 and thought you were asking about values rather than column names. If the linked question doesn't answer what you wanted then ask another but this time include an minimal example of input and desired output, which should make it easier to properly answer the question or link to a relevant duplicate if this one doesn't entirely cover it. – SamR Aug 01 '22 at 10:51
  • 2
    I re-opened the post, please edit your post with expected output. – zx8754 Aug 01 '22 at 10:53

1 Answers1

1

Thanks for clarifying the question. This is a good time for stringr::str_replace().

# Lookup table you posted
lookup  <- read.table(text = "randos clear
1 d2ef 01
2 6326 02
3 fc31 03
4 02ac 04
5 e43a 05
6 1ac7 06", h=T)

# Generate some data with colnames to be replaced
dat  <- data.frame(
   T_01_X = 1, 
   T_02_X = 1,
   C_G_03 = 1,
   L04_B = 1,
   R05Q = 1,
   J06R = 1
)


names(dat)  <- stringr::str_replace(
  names(dat), 
  pattern = as.character(lookup$clear), 
  replacement = lookup$randos
  )

dat
#   T_0d2ef_X T_06326_X C_G_0fc31 L002ac_B R0e43aQ J01ac7R
# 1         1         1         1        1       1       1

EDIT: The above works for me but not for the OP. The solution which worked for the OP (see comments) is:

names(dat)  <- stringi::stri_replace_all_fixed(
   names(dat), 
   pattern = lookup$clear, 
   replacement = lookup$randos,
   vectorize_all = FALSE
)
SamR
  • 8,826
  • 3
  • 11
  • 33
  • We are almost there, I also the example, this makes necessary, that the length of "dat" and "lookup" is the same. Unfortunately this is not the case for me, the dat is just a subset, so not all lookup replacements are needed. Thanks for help! – Herr Student Aug 02 '22 at 10:38
  • 1
    @HerrStudent OK then you can use `stringr::str_replace()` - I have updated the answer. You'll get the same warning but it should work. – SamR Aug 02 '22 at 10:55
  • 1
    actually there is another attribute that works perfect: `vectorize_all = FALSE` So we don't need the characters :-) – Herr Student Aug 02 '22 at 11:08
  • 1
    @HerrStudent careful though if you do that you may find that you replace more than once so e.g. `T_01_X` ultimately becomes `"T_0d1ac7fc3121ac7ef_X"` – SamR Aug 02 '22 at 11:18
  • But your solution is still not working in my case, whereas the `vectorize_all` seems to make no problems in the example/current data problem. For stringr: `'names' attribute [55] must be the same length as the vector [15] In addition: Warning message: In stri_replace_first_regex(string, pattern, fix_replacement(replacement), : longer object length is not a multiple of shorter object length` – Herr Student Aug 02 '22 at 11:29
  • 1
    @HerrStudent OK I can't explain that! I've edited the response to include both solutions so hopefully it will be a useful resource for anyone who gets here by googling a similar problem. – SamR Aug 02 '22 at 11:32
  • 1
    You are correct, I don't have it (my clear is actually 3 numbers), but if it fit's also to a replaced (randos) it might happen again that it's replaced - so kind of dangerous! – Herr Student Aug 02 '22 at 13:12