In R, how do I map multiple values to different values based on a conversion table?

Question

I have a large vector of many values. I also have a table that shows what each of those values should be converted to. I know how to do this for one value of a vector at a time using gsub, but I'm not sure how to do this for all values simultaneously. Essentially, I want to take a vector, reference a datatable to figure out what each item of that vector should be converted to, and convert it.

Example:

test <- data.frame(Name = c(rep("TestA", 3), rep("TestB", 4), rep("TestC", 2)))
conversion <- data.table(Original = c("TestA", "TestB", "TestC"), New = c("380", "JK", "LOL"))

test
   Name
1 TestA
2 TestA
3 TestA
4 TestB
5 TestB
6 TestB
7 TestB
8 TestC
9 TestC

conversion
   Original New
1:    TestA 380
2:    TestB  JK
3:    TestC LOL

What I want:

   Name NewName
1 TestA     380
2 TestA     380
3 TestA     380
4 TestB      JK
5 TestB      JK
6 TestB      JK
7 TestB      JK
8 TestC     LOL
9 TestC     LOL

Simply merge or join the two. – Parfait May 04 '19 at 18:57 — Parfait, May 04 '19 at 18:57

akrun · Answer 1 · 2019-05-04T19:44:26.657

One option is a data.table join. Convert the 'test' dataset to data.table (setDT), then join with 'conversion' on by the columns ('Name', 'Original' - as these are different names, because have to specify with =), assign (:=) the 'New' from 'conversion' to create the 'NewName' column in 'test'. If there are no matching elements, it will be a NA

library(data.table)
setDT(test)[conversion, NewName := New, on = .(Name = Original)]
test
#    Name NewName
#1: TestA     380
#2: TestA     380
#3: TestA     380
#4: TestB      JK
#5: TestB      JK
#6: TestB      JK
#7: TestB      JK
#8: TestC     LOL
#9: TestC     LOL

Or without using any packages

test$NewName <- conversion$New[match(test$Name, conversion$Original)]

Would you mind explaining how this works? The confusing part for me is that the arguments all are in the brackets instead of in the actual function call, so although it works I don't really understand the structure of how it is working. — Jay, May 04 '19 at 18:59

score 1 · Answer 2 · answered May 04 '19 at 19:07

I suggest tidyverse, it uses a natural lenguage.

test <- tibble(Name = c(rep("TestA", 3), rep("TestB", 4), rep("TestC", 2)))
conversion <- tibble(Original = c("TestA", "TestB", "TestC"), New = c("380", "JK", "LOL"))

test %>% 
  left_join(conversion, by = c("Name" = "Original"))

score 1 · Answer 3 · answered May 04 '19 at 20:05

You don't really need anything fancy here: just vector indexing. Starting with your code

test <- data.frame(Name = c(rep("TestA", 3), rep("TestB", 4), rep("TestC", 2)))
conversion <- data.table(Original = c("TestA", "TestB", "TestC"), New = c("380", "JK", "LOL"))

change the conversion data.table to a vector:

vec <- conversion$New
names(vec) <- conversion$Original
vec
# TestA TestB TestC 
# "380"  "JK" "LOL"

Then add a new column by indexing:

test$NewName <- vec[test$Name]

By the way, if your conversion table was being entered by hand, you could have created vec directly:

vec <- c(TestA = "380", TestB = "JK", TestC = "LOL")

In R, how do I map multiple values to different values based on a conversion table?

3 Answers3