I have the following dataframe in R:
df <- data.frame(Sample_name = c("01_00H_NA_DNA", "01_00H_NA_RNA", "01_00H_NA_S", "01_00H_NW_DNA", "01_00H_NW_RNA", "01_00H_NW_S", "01_00H_OM_DNA", "01_00H_OM_RNA", "01_00H_OM_S", "01_00H_RL_DNA", "01_00H_RL_RNA", "01_00H_RL_S"),
Pair = c("","", "S1","","","S2","","","S3","", "","S5"))
I would like to generate a new variable Label
such that similar strings in Sample_name
until the last _
before DNA/RNA or S
get matched to give a similar label Id number. While each row may not start with 01_00H
, there will always be similar strings until the last underscore to group for the label variable.
Furthermore, I would like to also fill the pair variable with similar values, S1 for all identical labels and so on. The existing Pair values are not continuous i.e S3 is followed by S5 and so on.
Resulting dataframe will look something like this:
This has been incredibly hard to do, I followed How to create new column in dataframe based on partial string matching other column in R but it helped me only partially for direct 1:1 renaming.
Any solutions from useRs will be much appreciated, Thanks!