How pivot longer and match the variable name with a secondary table?

Question

I'm struggling with the logic in order to cross these 2 tables:

My dataframes are:

df6 includes an id and a numeric value per column. v1 v2 and v3 will always have one and only one non-zero value, which indicates the yummyness value of that variable.

id v1 v2 v3
1  0  0  3
2  2  0  0
3  0  1  0

df_yummy contains the name equivalence for v1 v2 and v3 as well as a "yummyness" value.

VarId   VarName  Yummyness
v1      Apple    2
v2      Lemon    1
v3      Peach    3

My desired outcome should have

id  Fruit  Yummyness
1   Peach  3
2   Apple  2
3   Lemon  1

EDIT: Typo correction

It's not at all apparent to me how these actually would cross together. For id = 1, the V3 variable in filled in, but you have id = 1 being Apple in your outcome. ??? — TARehman, Aug 30 '21 at 15:01
Im sorry. I had a typo in my desired outcome. I corrected it — Andres Mora, Aug 30 '21 at 15:13

score 3 · Accepted Answer · answered Aug 30 '21 at 15:41

here is one solution to your problem within the tidyverse (it is critical to assure that v1, v2 and v3 only show one value per line diferent than zero else you will get duplication or missing data):

#dummy data
df1 <- data.table::fread("id v1 v2 v3
1  0  0  3
2  2  0  0
3  0  1  0")

df2 <- data.table::fread("VarId   VarName  Yummyness
v1      Apple    2
v2      Lemon    1
v3      Peach    3")

library(tidyverse)

df1 %>%
    tidyr::pivot_longer(-id, names_to = "VarId", values_to = "vals") %>%
    dplyr::filter(vals != 0) %>%
    dplyr::right_join(df2, by = "VarId") %>%
    dplyr::select(id, Fruit = VarName,  Yummyness)

# A tibble: 3 x 3
     id Fruit Yummyness
  <int> <chr>     <int>
1     1 Peach         3
2     2 Apple         2
3     3 Lemon         1

@AndresMora the right join is essentially a left join, when you invert the order of tables/data.frames... I just used it to stay within the same pipe, as it started with df1 — DPH, Aug 30 '21 at 17:31

TARehman · Answer 2 · 2021-08-30T21:25:14.120

Here is a simple solution that uses base R.

# Dummy data
df6 <- data.frame(
    id = c(1, 2, 3),
    v1 = c(0, 2, 0),
    v2 = c(0, 0, 1),
    v3 = c(3, 0, 0)
)

df_yummy <- data.frame(
    VarId = c("v1", "v2", "v3"),
    VarName = c("Apple", "Lemon", "Peach"),
    Yummyness = c(2, 1, 3)
)

df6$VarId <- names(df6[-1])[apply(X = df6[-1] > 0, MARGIN = 1, FUN = which)]
df_result <- merge(df6, df_yummy)[, c("id", "VarName", "Yummyness")]
names(df_result) <- c("id", "Fruit", "Yummyness")

> df_result[order(df_result$id), ]
  id Fruit Yummyness
3  1 Peach         3
1  2 Apple         2
2  3 Lemon         1

The crux of this solution is found in apply(X = df6[-1] > 0, MARGIN = 1, FUN = which). What this does is take your df6 and removes the first column (the [-1]) portion and then finds which records are greater than zero. We then use the apply function to apply the which function over the rows (the MARGIN = 1) portion. This means that for every row, we are returning the index which is non-zero. We pass this to names() to get the right name for each, and then the rest is just some joins.

Here's a slightly more compact version that uses the data.table package.

library(data.table)
# Dummy data
df6 <- data.table(
    id = c(1, 2, 3),
    v1 = c(0, 2, 0),
    v2 = c(0, 0, 1),
    v3 = c(3, 0, 0)
)

df_yummy <- data.table(
    VarId = c("v1", "v2", "v3"),
    VarName = c("Apple", "Lemon", "Peach"),
    Yummyness = c(2, 1, 3)
)

df6[, VarId := names(df6)[-1][which(.SD > 0)], by = id]
df_result <- merge(df6, df_yummy)[order(id), list(id, Fruit = VarName, Yummyness)]

This operates on much the same principle; we use the which function on the data.table specific .SD, by id, and then use this to pick the right name. This ends up slightly more compact because of the DT syntax but isn't fundamentally all that different.

Martin Gal · Answer 3 · 2021-08-30T15:50:06.497

Not sure if this is what you are looking for:

library(dplyr)

df6 %>% 
  mutate(Yummyness = rowSums(across(v1:v3)), .keep = "unused") %>% 
  left_join(df_yummy, by = "Yummyness") %>% 
  select(id, Fruit = VarName, Yummyness)

returns

# A tibble: 3 x 3
     id Fruit Yummyness
  <dbl> <chr>     <dbl>
1     1 Peach         3
2     2 Apple         2
3     3 Lemon         1

Data

df6 <- structure(list(id = c(1, 2, 3), v1 = c(0, 2, 0), v2 = c(0, 0, 
1), v3 = c(3, 0, 0)), class = c("spec_tbl_df", "tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -3L), spec = structure(list(
    cols = list(id = structure(list(), class = c("collector_double", 
    "collector")), v1 = structure(list(), class = c("collector_double", 
    "collector")), v2 = structure(list(), class = c("collector_double", 
    "collector")), v3 = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1L), class = "col_spec"))

df_yummy <- structure(list(VarId = c("v1", "v2", "v3"), VarName = c("Apple", 
"Lemon", "Peach"), Yummyness = c(2, 1, 3)), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L), spec = structure(list(
    cols = list(VarId = structure(list(), class = c("collector_character", 
    "collector")), VarName = structure(list(), class = c("collector_character", 
    "collector")), Yummyness = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1L), class = "col_spec"))

score 0 · Answer 4 · answered Aug 30 '21 at 21:51

A data.table option

setorder(
  melt(setDT(df6), id.var = "id")[
    value != 0
  ][setDT(df_yummy),
    on = ("variable == VarId")
  ],
  by = "id"
)[, .(id, VarName, Yummyness)]

gives

   id VarName Yummyness
1:  1   Peach         3
2:  2   Apple         2
3:  3   Lemon         1

How pivot longer and match the variable name with a secondary table?

4 Answers4

Data