I have some columns in my dataset, that either have an * in last position or a letter ranging from A to G. Can someone explain why these are in the dataset and how i can remove them from the column? Because I can't run analyses with the variables, when tehse elements are still included. Examples are 73.5* or 0.00G.
Asked
Active
Viewed 34 times
0
-
1I have no idea why they're there as I have no idea where your data comes from, but you can use the base function `sub` or `gsub` to replace characters in a string – Marcus Mar 27 '20 at 14:52
1 Answers
1
You can use gsub
.
The pattern matches *
(which has to be escaped with \\
because it is a special character) or a capital letter which occur at the end of the string (denoted with $
). It then replaces it with nothing, ""
.
dataframe <-data.frame(ID = 1:3,column = c("73.5*","0.00G","2.84"))
dataframe
# ID column
#1 1 73.5*
#2 2 0.00G
#3 3 2.84
dataframe$column <- gsub("(\\*|[A-Z])$","",dataframe$column)
# ID column
#1 1 73.5
#2 2 0.00
#3 3 2.84

Ian Campbell
- 23,484
- 14
- 36
- 57