I have a data frame having more than a million rows. It has a key column having key values as character. This key column has around 900 distinct values. A number of these values are minor variations of a standard value. Out of these 900 values, approx 175 of them are to be mapped to standard values. The following sample code explains how did I get the mapping done to correct the values. Here "Event 1" value needs to be replaced by "evt 1":
id = c(1:4)
k1 = c("Event 1", "evt 1", "evt 2", "evt 3")
v1 = c(101:104)
df = data.frame(id, k1, v1)
df$k1 = as.character(df$k1)
### map the non-standard values to standard values using named vector approach
mapEvents = c("Event 1" = "evt 1")
vNames = names(mapEvents)
stTime = proc.time()
df$k1 = ifelse(df$k1 %in% vNames, mapEvents[df$k1], df$k1)
proc.time() - stTime
This code works ok BUT with a serious performance issue. The ifelse code takes around 9 minutes to complete on my i7 system.
How I make this mapping to execute in fastest possible way? Appreciate the help very much.