7

My data,

Id|date1|date2   
1|2008-10-01|NA        
1|NA|2008-10-02     
1|NA|2008-10-03     
2|2008-10-02|NA
2|NA|2008-10-03

I want output this way,

Id|date1|date2|date3    
1|2008-10-01|2008-10-02|2008-10-03        
2|2008-10-02|2008-10-03 

I tried using aggregate and dcast but they are making date into numeric format and na's are still not avoided.

pnuts
  • 58,317
  • 11
  • 87
  • 139
Knight
  • 363
  • 2
  • 7
  • 24
  • So what happens if you have many dates in both ids? How do you decide which date goes where? Also, do you really want to create a limitless number of columns? – David Arenburg Jun 29 '15 at 20:25

2 Answers2

5

You could do this quite easily using data.table though it will get more complicated if the number of non-missing values isn't equal between the columns

library(data.table)
setDT(df)[, lapply(.SD, na.omit), by = Id]
#   Id      date1       date2
# 1:  1 2008-10-02 2008-10-02 
# 2:  2 2008-10-02 2008-10-02 
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • thanks, but it is creating duplicate rows, I mean i want one id with one row, if the same id has 3 different values then it should add new column for the new value, that's the reason I was trying dcast, aggregate etc., – Knight Jun 29 '15 at 19:39
  • Can you show an example data set and desired output? – David Arenburg Jun 29 '15 at 19:49
  • for example id-1 is present in 4 rows, with different dates, then the output must be like id | date-1 |date-2 | date-3 | date-4 (one row for one id, but expanding columns) – Knight Jun 29 '15 at 20:06
  • Then please update your original question with desired output – David Arenburg Jun 29 '15 at 20:12
  • 1
    @Maddy It may be better to use `toString` and concatenate the dates together as the lengths may be different in each `Id` and would still result in `NAs` in some columns by having separate columns – akrun Jun 29 '15 at 20:17
  • @akrun when I concatenated, dates are being changed into numeric format. But still if there are NA's in the table, I prefer to have seperate columns – Knight Jun 29 '15 at 20:25
  • 2
    `setDT(df)[, lapply(.SD, function(x) toString(na.omit(x))), by = Id]` work fine on your data, though I'm truly pazzeled how you want to handle many different dates in many ids and what are you going to do with all these columns – David Arenburg Jun 29 '15 at 20:27
4

Here's a similar idea using tidyr:

library(dplyr)
library(tidyr)

df %>%
  gather(key, value, -Id) %>% 
  na.omit() %>% 
  spread(key, value)

Which gives:

#  Id      date1      date2
#1  1 2008-10-02 2008-10-02
#2  2 2008-10-02 2008-10-02
Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77