I want to bring data to long format and integrate information from multiple columns.
Example data: Let's say we observe 4 products (id 1:4) in an online shop and the review comments (comment*) of different customers. One product (id = 1) has only one review comment, while another product (id = 4) has 4 comments. For each comment, we also observe if this comment quotes the comment of another user (1 if yes, and 0 otherwise).
data = data.frame(id = c(1,2,3,4), n_comments = c(2,1,3,4),
comment1 = c("consetetur sadipscing", "Lorem ipsum", "dolor sit ame", "nonumy eirmod "), comment1_quote = c(1,0,0,1),
comment2 = c("clita kasd gubergren", NA, "sanctus est", "consetetur sadipscing"), comment2_quote = c(0,NA,0,0),
comment3 = c(NA, NA, "invidunt ut labore", "ea rebum"), comment3_quote = c(NA,NA,1,0),
comment4 = c(NA, NA, NA, "dolores et ea rebum"), comment4_quote = c(NA,NA,NA,1))
data
id n_comments comment1 comment1_quote comment2 comment2_quote comment3 comment3_quote comment4 comment4_quote
1 1 2 consetetur sadipscing 1 clita kasd gubergren 0 <NA> NA <NA> NA
2 2 1 Lorem ipsum 0 <NA> NA <NA> NA <NA> NA
3 3 3 dolor sit ame 0 sanctus est 0 invidunt ut labore 1 <NA> NA
4 4 4 nonumy eirmod 1 consetetur sadipscing 0 ea rebum 0 dolores et ea rebum 1
Now we want to bring this data to long format by
- having one row for each comment on each product
- adding the information if the comment referred to a quote
- keeping the aggregate number of comments that does not vary for one product
Here is the target data:
target_data = data.frame(id = c(1,1,2,3,3,3,4,4,4,4), n_comments = c(2,2,1,3,3,3,4,4,4,4),
comment = c("consetetur sadipscing", "Lorem ipsum", "dolor sit ame", "nonumy eirmod ","clita kasd gubergren", "sanctus est", "consetetur sadipscing",
"invidunt ut labore", "ea rebum", "dolores et ea rebum"),
quote = c(1,0,0,1,0,0,0,1,0,1))
target_data
id n_comments comment quote
1 1 2 consetetur sadipscing 1
2 1 2 Lorem ipsum 0
3 2 1 dolor sit ame 0
4 3 3 nonumy eirmod 1
5 3 3 clita kasd gubergren 0
6 3 3 sanctus est 0
7 4 4 consetetur sadipscing 0
8 4 4 invidunt ut labore 1
9 4 4 ea rebum 0
10 4 4 dolores et ea rebum 1
Here is what I've tried, but does not work:
trial_da = data %>% tidyr::pivot_longer(cols = starts_with('comment'), values_to = "comment", values_drop_na = TRUE)
Fehler: Can't combine `comment1` <character> and `comment1_quote` <double>.
Run `rlang::last_error()` to see where the error occurred.
trial_da
Fehler: Objekt 'trial_da' not found
This is happening because the "quote" column also starts with "comment". However, I'm not sure how to solve this.