How to get all rows sharing same url into 1 row?

Question

data frame after unseating has multiple rows with na values that can be summarized into one row. All text/character data. Example:

link     feature-1  feature-2 feature-3
link_1     a.          NA       NA
link_1.    NA          NA       b
link_1.    NA.         c       NA
link2      NA.         a        NA
link_2     NA          NA       d
link_2     x           NA       NA

score 0 · Answer 1 · answered Apr 08 '21 at 21:09

Assuming that you are only ever combining NA values and text, then I recommend the following:

library(dplyr)

# here is a mock dataset
df = data.frame(grp = c('a','a','a','b','b','b'),
                value1 = c(NA,NA,'text','text',NA,NA),
                value2 = c(NA,'txt',NA,NA,'txt',NA),
                stringsAsFactors = FALSE)

df %>%
  # convert NA values to empty text strings
  mutate(value1 = ifelse(is.na(value1), "", value1),
         value2 = ifelse(is.na(value2), "", value2)) %>%
  # specify the groups
  group_by(grp) %>%
  # append all the text in each group into a single row
  summarise(val1 = paste(value1, collapse = ""),
            val2 = paste(value2, collapse = ""))

Based on this answer.

Looking at the data in your question, you might need to first standardize some values. Because "link_1" vs "link_1." and "NA" vs "NA." will be treated as different.

score 0 · Answer 2 · answered May 18 '21 at 01:04

You can use across to get first non-NA value by group in multiple columns.

library(dplyr)

df %>% group_by(link) %>% summarise(across(starts_with('feature'), ~na.omit(.)[1]))

#  link   feature.1 feature.2 feature.3
#  <chr>  <chr>     <chr>     <chr>    
#1 link_1 a         c         b        
#2 link_2 x         a         d

data

df <- structure(list(link = c("link_1", "link_1", "link_1", "link_2", 
"link_2", "link_2"), feature.1 = c("a", NA, NA, NA, NA, "x"), 
    feature.2 = c(NA, NA, "c", "a", NA, NA), feature.3 = c(NA, 
    "b", NA, NA, "d", NA)), class = "data.frame", row.names = c(NA, -6L))

How to get all rows sharing same url into 1 row?

2 Answers2