2

very new to R coding and I've been trying to format/merge rows in a df using group_by from dplyr. However, I'm not quite there yet.

This is a simplification of my data table. The first three entries share the same id, and the last two entries share the same id.

ID                   Assay1  Assay2  Assay3  Assay4  Assay5
13,12 Months,<=-65C  12      NA      NA      NA      NA
13,12 Months,<=-65C  NA      11      NA      NA      NA
13,12 Months,<=-65C  NA      NA      33      NA      NA
09,06 Months,<=-65C  112     NA      NA      NA      NA
09,06 Months,<=-65C  NA      115     NA      NA      NA

I want to achieve the following:

ID                   Assay1  Assay2  Assay3  Assay4  Assay5
13,12 Months,<=-65C  12      11      33      NA      NA
09,06 Months,<=-65C  112     115     NA      NA      NA

So that the values in AssayValueX are merged into one row per unique id. I've used group_by with summarise but I dont want a summary... I want a new df with the merged rows! If other functions are more appropriate to achieve this, please let me know.

Jaap
  • 81,064
  • 34
  • 182
  • 193
pdench
  • 41
  • 4

3 Answers3

3

You can using dplyr and zoo

df %>%
    group_by(ID) %>%
    mutate_each(funs(na.locf(., na.rm = FALSE, fromLast = FALSE)))%>%filter(row_number()==n())


ID                   Assay1  Assay2  Assay3  Assay4  Assay5
13,12 Months,<=-65C  12      11      33      NA      NA
09,06 Months,<=-65C  112     115     NA      NA      NA

EDIT

For you addtional question

Data:

ID Assay1 Assay2 Assay3 Assay4 Assay5
1 13,12 Months,<=-65C     12     13     NA     NA     NA
2 13,12 Months,<=-65C     11     11    999     NA     NA
3 13,12 Months,<=-65C     NA     NA     33     NA     NA
4 09,06 Months,<=-65C    112     NA     NA     NA     NA
5 09,06 Months,<=-65C     NA    115     NA     NA     NA

Solution

df=df %>%
    group_by(ID) %>%
    summarise_all(funs(toString(na.omit(.))))
df[df=='']=NA
    > df
                   ID Assay1 Assay2  Assay3 Assay4 Assay5
                <chr>  <chr>  <chr>   <chr>  <chr>  <chr>
1 09,06 Months,<=-65C    112    115    <NA>   <NA>   <NA>
2 13,12 Months,<=-65C 12, 11 13, 11 999, 33   <NA>   <NA>
BENY
  • 317,841
  • 20
  • 164
  • 234
  • I've now come across cases where two or three rows share the same ID, and all have a value for Assay1 (or Assay2, Assay3). This is because the experiment is performed with several replicates for the same condition. Can you suggest how I could aggregate/paste/paste0 the two or three values in one cell? So, the result for one row would be something: ID = 13,12 Months, <=-65C; Assay1 = 12, 13, 19; Assay2 = 11, 115, 23 etc... – pdench Jul 19 '17 at 09:19
  • @pdench see my edit, it will give back what you want – BENY Jul 19 '17 at 22:31
  • @pdench if it is what you want ,hope you can accept my answer ~ – BENY Jul 20 '17 at 02:30
2

Here's a tidyr solution. With your example data, there is no Assay4 and Assay5 data because they are NAs, but it should work with your real life data. Basically, the rationale is to gather the data, remove the NAs and spread it again.

library(tidyr)
df%>%
  gather(Assay, value,-ID)%>% 
  filter(!is.na(value))%>% 
  spread(Assay,value)

                   ID Assay1 Assay2 Assay3
1 09,06 Months,<=-65C    112    115     NA
2 13,12 Months,<=-65C     12     11     33

data

df <- read.table(text="ID                   Assay1  Assay2  Assay3  Assay4  Assay5
'13,12 Months,<=-65C'  12      NA      NA      NA      NA
'13,12 Months,<=-65C'  NA      11      NA      NA      NA
'13,12 Months,<=-65C'  NA      NA      33      NA      NA
'09,06 Months,<=-65C'  112     NA      NA      NA      NA
'09,06 Months,<=-65C'  NA      115     NA      NA      NA",
 header=TRUE,stringsAsFactors=FALSE)
Pierre Lapointe
  • 16,017
  • 2
  • 43
  • 56
1

Another possible solution via dplyr,

library(dplyr)

df %>% 
  group_by(ID) %>%  
  summarise_all(funs(sum(., na.rm = TRUE))) %>% 
  mutate_all(funs(replace(., . == 0, NA)))

# A tibble: 2 x 6
#                   ID Assay1 Assay2 Assay3 Assay4 Assay5
#               <fctr>  <int>  <int>  <int>  <int>  <int>
#1 09,06_Months,<=-65C    112    115     NA     NA     NA
#2 13,12_Months,<=-65C     12     11     33     NA     NA
Sotos
  • 51,121
  • 6
  • 32
  • 66