-1

EDIT: I am aware there is a similar question that has been answered, but it does not work for me on the dataset I have provided below. The above dataframe is the result of me using the spread function. I am still not sure how to consolidate it.

EDIT2: I realized that the group_by function, which I had previously used on the data, is what was preventing the spread function from working in the way I wanted it to work originally. After using ungroup, I was able to go straight from the original dataset (not pictured below) to the 2nd dataframe pictured below.


I have a dataframe that looks like the following. I am trying to make it so that there is only 1 row for each id number.

id  init_cont  family  1  2  3
1   I          C       1  NA NA
1   I          C       NA 4  NA
1   I          C       NA NA 3
2   I          D       2  NA NA
2   I          D       NA 1  NA
2   I          D       NA NA 4
3   K          C       3  NA NA
3   K          C       NA 4  NA
3   K          C       NA NA 1

I would like the resulting dataframe to look like this.

id  init_cont  family  1  2  3
1   I          C       1  4  3
2   I          D       2  1  4
3   K          C       3  4  1
melbez
  • 960
  • 1
  • 13
  • 36
  • 2
    Very similar to https://stackoverflow.com/questions/42567075/combining-pivoted-rows-in-r-by-common-value/42567254 or https://stackoverflow.com/questions/42661103/collapsing-duplicate-rows-in-r-by-two-variables – thelatemail Apr 27 '18 at 05:02
  • 1
    What have you tried yourself? Where's the code to make reproducible data? – rg255 Apr 27 '18 at 05:06

1 Answers1

1

We cangroup_by the 'd', 'init_cont', 'family' and then do a summarise_all to remove all the NA elements in the columns 1:3

library(dplyr)
df1 %>%
   group_by_at(names(.)[1:3]) %>%
   summarise_all(na.omit)
   #Or
   #summarise_all(funs(.[!is.na(.)]))
# A tibble: 3 x 6
# Groups: d, init_cont [?]
#      d init_cont family   `1`   `2`   `3`
#   <int> <chr>     <chr>  <int> <int> <int>
#1     1 I         C          1     4     3
#2     2 I         D          2     1     4
#3     3 K         C          3     4     1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • If I have several more columns than those pictured here (similar to id, init_cont, and family), how can I include all of them in the group_by function without typing them all out? – melbez Apr 27 '18 at 05:08
  • @melbez You can use `group_by_at` and pass the `names` – akrun Apr 27 '18 at 05:09
  • The last line with summarise_all yields the following error: Error in summarise_impl(.data, dots) : Column `1` must be length 1 (a summary value), not 0 . This is referring to the column with title name '1', not to the 1st column in the table. – melbez Apr 27 '18 at 05:31
  • @melbez This is based on your example, where there is only one non-NA per each group. If that is not the case, you should put that in a list or so – akrun Apr 27 '18 at 07:24
  • There is only one non-NA per each group in my actual data, as I have demonstrated here. That is why I am not sure why this is not working. – melbez Apr 27 '18 at 15:44
  • 1
    It wasn't working because I had used group_by on a previous dataframe. I didn't realize the group_by function persisted. I had to ungroup the dataframe in order to get this to work. – melbez Apr 27 '18 at 16:08