-1

I have a dataframe with five columns and 6 rows (actually they are many more, just trying to simplify matters):

One Two Three   Four    Five
Cat NA  NA  NA  NA
NA  Dog NA  NA  NA
NA  NA  NA  Mouse   NA
Cat NA  Rat NA  NA
Horse   NA  NA  NA  NA
NA NA NA NA NA

Now, I would like to coalesce all the information in a new single column ('Summary'), like this:

Summary
Cat
Dog
Mouse
Error
Horse
NA

Please note the 'Error' reported on the fourth Summary row, because two different values have been reported during the merging. I tried to look at the 'coalesce' function in the dplyr package, but it really desn't seem to do what I need. Thanks in advance.

Edited: I added a 6th row to indicate that in case of all 'NA' in row, I would like to get 'NA' and not 'Errors' in my 'Summary' column. Sorry if this was not clear in my first post.

Arturo
  • 342
  • 1
  • 4
  • 14
  • Does this answer your question? [Combine column to remove NA's](https://stackoverflow.com/questions/14563531/combine-column-to-remove-nas) – Adamm Jan 14 '20 at 13:40
  • What code have you tried exactly? You say "please note the error," but you didn't include what you did to return that as a value – camille Jan 14 '20 at 14:13
  • @Adamm. Not really because doesn't handle the Error conditoin and also I have to name all the colnames – Arturo Jan 14 '20 at 17:53

5 Answers5

2

Here is an idea via apply,

apply(df, 1, function(i){i1 <- i[!is.na(i)]; if(length(i1) > 1){'Error'}else{i1}})
#[1] "Cat"   "Dog"   "Mouse" "Error" "Horse"
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • hgh> library(RCurl) > a <- getURL('https://download1645.mediafire.com/4zvabh0uqf6g/96px8ophovxrxe9/example.tab') > df <- read.table(text=a,header=TRUE, sep = "\t") > View(Normal1) – Arturo Jan 14 '20 at 16:20
1

I would use apply to solve this, since you need to handle specific cases. EG

df <- structure(list(One = structure(c(1L, NA, NA, 1L, 2L), .Label = c("Cat", 
"Horse", "NA"), class = "factor"), Two = structure(c(NA, 1L, 
NA, NA, NA), .Label = c("Dog", "NA"), class = "factor"), Three = structure(c(NA, 
NA, NA, 2L, NA), .Label = c("NA", "Rat"), class = "factor"), 
    Four = structure(c(NA, NA, 1L, NA, NA), .Label = c("Mouse", 
    "NA"), class = "factor"), Five = structure(c(NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_), .Label = "NA", class = "factor")), row.names = c(NA, 
-5L), class = "data.frame")


apply(df, 1, function(row) if(sum(!is.na(row)) == 1) na.omit(row)[[1]] else "Error")
#> [1] "Cat"   "Dog"   "Mouse" "Error" "Horse"

Created on 2020-01-14 by the reprex package (v0.3.0)

alan ocallaghan
  • 3,116
  • 17
  • 37
0

Another way is to use the new pivot_ function in dplyr:

df <- tribble(~One, ~Two, ~Three,   ~Four,    ~Five,
              "Cat", NA,  NA,  NA,  NA,
              NA,  "Dog", NA,  NA,  NA,
              NA,  NA,  NA,  "Mouse",   NA,
              "Cat", NA,  "Rat", NA,  NA,
              "Horse",   NA,  NA,  NA,  NA)

df %>% 
  pivot_longer(names_to = "variable", values_to = "Summary", 
               values_drop_na = TRUE, cols = One:Five) %>% 
  distinct(Summary)
# # A tibble: 5 x 1
# Summary
# <chr>  
# 1 Cat    
# 2 Dog    
# 3 Mouse  
# 4 Rat    
# 5 Horse  
OTStats
  • 1,820
  • 1
  • 13
  • 22
0
  • Here is another base R solution, using sapply() + ifelse()
r <- sapply(as.list(as.data.frame(t(df))),
            function(x) ifelse(length(levels(x))==1, na.omit(as.vector(x)),"Error"))

such that

> r
     V1      V2      V3      V4      V5 
  "Cat"   "Dog" "Mouse" "Error" "Horse"
  • Or you can use sapply() + ifelse()
r <- apply(df, 1, function(x) ifelse(length(z <- unique(na.omit(x)))==1, z,"Error"))

such that

> r
[1] "Cat"   "Dog"   "Mouse" "Error" "Horse"

DATA

df <- structure(list(One = c("Cat", NA, NA, "Cat", "Horse"), Two = c(NA, 
"Dog", NA, NA, NA), Three = c(NA, NA, NA, "Rat", NA), Four = c(NA, 
NA, "Mouse", NA, NA), Five = c(NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-5L))
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • Thanks. Your example works great on the example I provided, but fails on my real test case (a 11x3170 dataframe, with col names): library(RCurl) a <- getURL('https://download1645.mediafire.com/4zvabh0uqf6g/96px8ophovxrxe9/example.tab') df <- read.table(text=a,header=TRUE, sep = "\t") r <- sapply(as.list(as.data.frame(t(df))),function(x) ifelse(length(levels(x))==1, na.omit(as.vector(x)),"Error")) The second element of 'r' should be 'NA' not 'Error'. The same the third one, etc. – Arturo Jan 14 '20 at 16:40
  • Sorry for the bad format of my previuos comment, but I wasn't able to edit the post. Hope you can go through it the same. – Arturo Jan 14 '20 at 16:41
0

You can also use coalesce

df %>%
  mutate_all(as.character) %>% 
  mutate(coal = coalesce(!!!syms(names(.))),
         sum_na = rowSums(!is.na(.)),
         result = if_else(sum_na == 1,coal,"Error")) %>% 
  select(result)
Bruno
  • 4,109
  • 1
  • 9
  • 27