3

I have a data table that looks like this:

old1   old2   old3   old4
aaa           ccc
aaa    bbb     
       bbb    ccc    ddd    

I want to remove the empty columns to have something like this:

new1  new2   new3   
aaa   ccc
aaa   bbb     
bbb   ccc    ddd   

I've tried the following which does not work for me:

df[, colSums(df!= "") != ""]
df[!sapply(df, function (x) all(is.na(x) | x == ""))]
Filter(function(x) !(all(x==""|x==0)), df) 
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
CHONG
  • 373
  • 1
  • 5
  • 13
  • @thelatemail Almost similar but I am not sure if it is exactly a dupe since here number of columns in final output is not the same as initial input. Also were you able to adapt any of the solution from that link for this example? – Ronak Shah Apr 16 '19 at 02:59
  • 1
    @RonakShah - please feel free to reopen if you disagree. The input and output sets are practically identical for both questions in my opinion. I'm not sure if dropping a column makes it enough different, but I can see your point. – thelatemail Apr 16 '19 at 03:24

1 Answers1

1

One option using base R apply is to first calculate number of columns which are going to be present in the final dataframe (cols). Filter empty values from each row and insert empty values using rep.

cols <- max(rowSums(df != ""))

as.data.frame(t(apply(df, 1, function(x) {
   vals <- x[x != ""]
   c(vals, rep("", cols - length(vals)))
})))

#   V1  V2  V3
#1 aaa ccc    
#2 aaa bbb    
#3 bbb ccc ddd

Another option with gather/spread would be to add a new column for row number convert it to long format using gather, filter the non-empty values, group_by every row and give new column names using paste0 and finally convert it to wide format using spread.

library(dplyr)
library(tidyr)

df %>%
  mutate(row = row_number()) %>%
  gather(key, value, -row) %>%
  filter(value != "") %>%
  group_by(row) %>%
  mutate(key = paste0("new", row_number())) %>%
  spread(key, value, fill = "") %>%
  ungroup() %>%
  select(-row)

#  new1  new2  new3 
#  <chr> <chr> <chr>
#1 aaa   ccc   ""   
#2 aaa   bbb   ""   
#3 bbb   ccc   ddd  
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks @Ronak Shah. I have tried the solution you shared with my actual large data and is working. Just that I have this error message: "Warning message: attributes are not identical across measure variables; they will be dropped" I think it might due to the dummy index for the new column (row number)? – CHONG Apr 16 '19 at 03:39
  • @CHONG it's a warning message (not an error), I am not sure why though. Maybe you have `NA`s in your data or have certain columns as as factor? – Ronak Shah Apr 16 '19 at 04:04
  • Yes @Ronak Shah. Using str(df) - I can see that all my columns are factors. – CHONG Apr 16 '19 at 04:10
  • So you can convert it to characters and then continue with the remaining chain `df %>% mutate_all(as.character) %>% mutate(row = row_number()) %>% gather(key, value, -row).....` – Ronak Shah Apr 16 '19 at 04:16