0

I feel like this question is asked a lot but all the solutions I found don't work for me either.

I have a dataframe with a column (called ID) in which I have a string of numbers and letters (e.g: Q8A203). In a few rows there are two of those constructs separated by a vertical bar (e.g: Q8AA66|Q8AAT5). For my analysis it doesn't matter which one I keep so I wanted to make a new column named NewColumn in which I transfer the first and split the string at |.

I know that the vertical bar must be treated differently and that I have to put \\ in front. I tried strsplit() and unlist():

df$NewColumn <- strsplit(df$ID,split='\\|',fixed=TRUE)
df$NewColumn <- unlist(strsplit(df$ID, " \\| ", fixed=TRUE))

Both options return the exact same content from column ID to the NewColumn.

I would very much appreciate the help.

lovalery
  • 4,524
  • 3
  • 14
  • 28
Marlop
  • 17
  • 4

1 Answers1

1

Rather than splitting you can simply substitute the second part with nothing and it will keep the first ID.

df <- data.frame(ID = c("Q8A203", "Q8AA66|Q8AAT5"))
df$NewColumn <- sub("\\|.*$","", df$ID, )
df  
#              ID NewColumn
# 1        Q8A203    Q8A203
# 2 Q8AA66|Q8AAT5    Q8AA66

Please next time, add an minimal reproductible example (your df here) to speed up answers ;)

strsplit can work if you remove the fixed option, but you need to provide an exact regex. Also, you will need to work with a list after, which is more complex.

# Working with a list
unlist(lapply(strsplit(df$ID, split='\\|'), "[[", 1))
Gowachin
  • 1,251
  • 2
  • 9
  • 17
  • 1
    Thank you so much. This works! Also thank you for the pointer regarding the minimal reproductible example . I will do that next time. – Marlop Mar 16 '22 at 09:19
  • You're welcome ! A good point is also to work string by testing just one. For example, you can test `strsplit("text|text", split = "\\t")` to work out what split value you need. For more information, read about ["RegEx" (REGular EXpressions)](https://evoldyn.gitlab.io/evomics-2018/ref-sheets/R_strings.pdf), it's a very complex but usefull tool, and multiple package simplify this – Gowachin Mar 16 '22 at 10:08