0

I have some columns in my dataset, that either have an * in last position or a letter ranging from A to G. Can someone explain why these are in the dataset and how i can remove them from the column? Because I can't run analyses with the variables, when tehse elements are still included. Examples are 73.5* or 0.00G.

Amy
  • 91
  • 1
  • 12
  • 1
    I have no idea why they're there as I have no idea where your data comes from, but you can use the base function `sub` or `gsub` to replace characters in a string – Marcus Mar 27 '20 at 14:52

1 Answers1

1

You can use gsub.

The pattern matches * (which has to be escaped with \\ because it is a special character) or a capital letter which occur at the end of the string (denoted with $). It then replaces it with nothing, "".

dataframe <-data.frame(ID = 1:3,column = c("73.5*","0.00G","2.84"))
dataframe 
#  ID column
#1  1  73.5*
#2  2  0.00G
#3  3   2.84

dataframe$column <- gsub("(\\*|[A-Z])$","",dataframe$column)
#  ID column
#1  1   73.5
#2  2   0.00
#3  3   2.84
Ian Campbell
  • 23,484
  • 14
  • 36
  • 57