Merging the content of several columns into one

Question

I have a dataframe with five columns and 6 rows (actually they are many more, just trying to simplify matters):

One Two Three   Four    Five
Cat NA  NA  NA  NA
NA  Dog NA  NA  NA
NA  NA  NA  Mouse   NA
Cat NA  Rat NA  NA
Horse   NA  NA  NA  NA
NA NA NA NA NA

Now, I would like to coalesce all the information in a new single column ('Summary'), like this:

Summary
Cat
Dog
Mouse
Error
Horse
NA

Please note the 'Error' reported on the fourth Summary row, because two different values have been reported during the merging. I tried to look at the 'coalesce' function in the dplyr package, but it really desn't seem to do what I need. Thanks in advance.

Edited: I added a 6th row to indicate that in case of all 'NA' in row, I would like to get 'NA' and not 'Errors' in my 'Summary' column. Sorry if this was not clear in my first post.

Does this answer your question? [Combine column to remove NA's](https://stackoverflow.com/questions/14563531/combine-column-to-remove-nas) — Adamm, Jan 14 '20 at 13:40
What code have you tried exactly? You say "please note the error," but you didn't include what you did to return that as a value — camille, Jan 14 '20 at 14:13
@Adamm. Not really because doesn't handle the Error conditoin and also I have to name all the colnames — Arturo, Jan 14 '20 at 17:53

score 2 · Answer 1 · answered Jan 14 '20 at 13:44

2

Here is an idea via apply,

apply(df, 1, function(i){i1 <- i[!is.na(i)]; if(length(i1) > 1){'Error'}else{i1}})
#[1] "Cat"   "Dog"   "Mouse" "Error" "Horse"

answered Jan 14 '20 at 13:44

Sotos

51,121
6
32
66

hgh> library(RCurl) > a <- getURL('https://download1645.mediafire.com/4zvabh0uqf6g/96px8ophovxrxe9/example.tab') > df <- read.table(text=a,header=TRUE, sep = "\t") > View(Normal1) – Arturo Jan 14 '20 at 16:20

score 1 · Answer 2 · answered Jan 14 '20 at 13:45

I would use apply to solve this, since you need to handle specific cases. EG

df <- structure(list(One = structure(c(1L, NA, NA, 1L, 2L), .Label = c("Cat", 
"Horse", "NA"), class = "factor"), Two = structure(c(NA, 1L, 
NA, NA, NA), .Label = c("Dog", "NA"), class = "factor"), Three = structure(c(NA, 
NA, NA, 2L, NA), .Label = c("NA", "Rat"), class = "factor"), 
    Four = structure(c(NA, NA, 1L, NA, NA), .Label = c("Mouse", 
    "NA"), class = "factor"), Five = structure(c(NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_), .Label = "NA", class = "factor")), row.names = c(NA, 
-5L), class = "data.frame")


apply(df, 1, function(row) if(sum(!is.na(row)) == 1) na.omit(row)[[1]] else "Error")
#> [1] "Cat"   "Dog"   "Mouse" "Error" "Horse"

^{Created on 2020-01-14 by the reprex package (v0.3.0)}

score 0 · Answer 3 · answered Jan 14 '20 at 13:58

0

Another way is to use the new pivot_ function in dplyr:

df <- tribble(~One, ~Two, ~Three,   ~Four,    ~Five,
              "Cat", NA,  NA,  NA,  NA,
              NA,  "Dog", NA,  NA,  NA,
              NA,  NA,  NA,  "Mouse",   NA,
              "Cat", NA,  "Rat", NA,  NA,
              "Horse",   NA,  NA,  NA,  NA)

df %>% 
  pivot_longer(names_to = "variable", values_to = "Summary", 
               values_drop_na = TRUE, cols = One:Five) %>% 
  distinct(Summary)
# # A tibble: 5 x 1
# Summary
# <chr>  
# 1 Cat    
# 2 Dog    
# 3 Mouse  
# 4 Rat    
# 5 Horse

answered Jan 14 '20 at 13:58

OTStats

1,820
1
13
22

3

That doesn't handle the error condition the OP mentioned – alan ocallaghan Jan 14 '20 at 14:02
1

@alanocallaghan wasn't clear in the OP, will have to edit. – OTStats Jan 14 '20 at 19:34

ThomasIsCoding · Answer 4 · 2020-01-14T14:58:59.130

0

Here is another base R solution, using sapply() + ifelse()

r <- sapply(as.list(as.data.frame(t(df))),
            function(x) ifelse(length(levels(x))==1, na.omit(as.vector(x)),"Error"))

such that

> r
     V1      V2      V3      V4      V5 
  "Cat"   "Dog" "Mouse" "Error" "Horse"

Or you can use sapply() + ifelse()

r <- apply(df, 1, function(x) ifelse(length(z <- unique(na.omit(x)))==1, z,"Error"))

such that

> r
[1] "Cat"   "Dog"   "Mouse" "Error" "Horse"

DATA

df <- structure(list(One = c("Cat", NA, NA, "Cat", "Horse"), Two = c(NA, 
"Dog", NA, NA, NA), Three = c(NA, NA, NA, "Rat", NA), Four = c(NA, 
NA, "Mouse", NA, NA), Five = c(NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-5L))

edited Jan 14 '20 at 14:58

answered Jan 14 '20 at 14:00

ThomasIsCoding

96,636
9
24
81

Thanks. Your example works great on the example I provided, but fails on my real test case (a 11x3170 dataframe, with col names): library(RCurl) a <- getURL('https://download1645.mediafire.com/4zvabh0uqf6g/96px8ophovxrxe9/example.tab') df <- read.table(text=a,header=TRUE, sep = "\t") r <- sapply(as.list(as.data.frame(t(df))),function(x) ifelse(length(levels(x))==1, na.omit(as.vector(x)),"Error")) The second element of 'r' should be 'NA' not 'Error'. The same the third one, etc. – Arturo Jan 14 '20 at 16:40
Sorry for the bad format of my previuos comment, but I wasn't able to edit the post. Hope you can go through it the same. – Arturo Jan 14 '20 at 16:41

Bruno · Answer 5 · 2020-01-14T14:12:19.757

0

You can also use coalesce

df %>%
  mutate_all(as.character) %>% 
  mutate(coal = coalesce(!!!syms(names(.))),
         sum_na = rowSums(!is.na(.)),
         result = if_else(sum_na == 1,coal,"Error")) %>% 
  select(result)

edited Jan 14 '20 at 14:12

answered Jan 14 '20 at 14:01

Bruno

4,109
1
9
27

2

That doesn't handle the error condition the OP mentioned – alan ocallaghan Jan 14 '20 at 14:02
1

@alanocallaghan Solved – Bruno Jan 14 '20 at 14:12

Merging the content of several columns into one

5 Answers5