-1

I have a data set where I had multiple rows all corresponding to baseline measures. I want to collapse these rows by record id so that each individual only has one row for baseline measures. This means I have to collapse across all variables, some of which are character variables. How do I do this? This is what I've tried:

df.test %>% group_by(id) %>% filter(time == 0) %>% 
    summarise_all(., collapse=", ")

an example data frame I'm working with looks like this:

data.frame(id = rep(99, 5), time = c(rep(0, 3), 3, 6), v1 = c("blk", NA, NA, 2, 3), v2 = c(NA, 1, NA, 4, 5), v3 = c(NA, NA, 1, 6, 7))

and I need it to look like this:

data.frame(id = rep(99, 3), time = c(rep(0, 1), 3, 6), v1 = c("blk", 2, 3), v2 = c(1, 4, 5), v3 = c(1, 6, 7))

I don't know if summarise is the right function to use here. Basically a problem I have is the summing of the characters which I think is why summarise doesn't really work. Ideally, all I really want to do is combine the information in all the rows where time = 0 by id so that i have a single row of time = 0 for every unique id.

(sorry I'm, not sure how to make the data.frame command print the data frame?)

Help please!

Edit: Example 2

data.frame(id = c(rep(99,5), 100, 101, 101), time = c(rep(0, 3), 3, 6, 0, 0, 0), character = c(NA, NA, "blk", rep(NA, 5)), binary = c(1, rep(NA, 5), 0, NA), continuous = c(NA, NA, 2.29, rep(NA, 5)))

This is close to what my data looks like. What I can say isn't working are the following:

1)the character variable is lost 2)the 0 is lost in the binary variable (even with != is.na(.) instead of != 0 2a) this was atomic and i changed it to factor and now it seems to work 3) the continous variable, 2.29, is lost 4) what's interesting and not pictured here is whole number values that are not 0 seem to be retained but integers are all gone--is this because integers cant be read if the structure is atomic?

Conclusion: i think i need to change all the values from atomic (they were brought into r that way from SPSS)? I will try this in the meantime.

edit2: the issue was not numeric vs atomic. i turned everything numeric and the integers still do not show up.

Michael
  • 111
  • 9
  • It would help if you provided a data sample and desired output. It's not clear what package you are taking `collapse` from and which different column types you have here – Calum You Apr 24 '19 at 22:30
  • @CalumYou good idea. i edited my question with example data. As far as the collapse function I was using goes, I believe it was collapse in dplyr. – Michael Apr 24 '19 at 22:57

2 Answers2

1

dplyr::collapse does not do want you want it to do, read ?dplyr::collapse. What you need is to pass summarise_all a function that picks out the values that you want to keep from each group. I am here inferring that you want to get rid of all 0 values. Though it doesn't matter here, I also use first to take the first value in each group in each column if there is more than one non-zero value to prevent errors. You should check your data to make sure it has this layout.

library(dplyr)
tbl <- data.frame(id = rep(99, 5), time = c(rep(0, 3), 3, 6), v1 = c("blk", 0, 0, 2, 3), v2 = c(0, 1, 0, 4, 5), v3 = c(0, 0, 1, 6, 7))
tbl %>%
  group_by(id, time) %>%
  summarise_all(~ .[. != 0] %>% first())
#> # A tibble: 3 x 5
#> # Groups:   id [1]
#>      id  time v1       v2    v3
#>   <dbl> <dbl> <fct> <dbl> <dbl>
#> 1    99     0 blk       1     1
#> 2    99     3 2         4     6
#> 3    99     6 3         5     7

Created on 2019-04-24 by the reprex package (v0.2.1)

Calum You
  • 14,687
  • 4
  • 23
  • 42
  • this is closer to what I need but it is failing to retain certain values, and I'm not sure why. some of those values are 0 and i tried replacing 0 with is.NA(.) but the 0s still were not maintained. – Michael Apr 24 '19 at 23:55
  • Try to identify rows that don't work and include them in your example – Calum You Apr 24 '19 at 23:58
  • Yeah, it looks like it's not collapsing rows so much as taking the first row and eliminating all others. – Michael Apr 25 '19 at 01:39
  • 1
    You still haven't said what your logic is for collapsing rows though. I decided to drop zeroes and take the first of what's left, because it reproduces your output. But if you have some logic that you can state for what to keep from each group of rows for each column, you need to write it else we don't know what you want. – Calum You Apr 25 '19 at 07:01
1

How about this?

temp <- df.test %>% group_by(id) %>% filter(time==0) %>% summarise_all(funs(paste0(.,collapse = "_")))

Hope this is what you are looking for.

  • This doesn't quite do the trick but it helps me get a lot closer. This is *essenitally* what I was looking for. This gives me a bunch of NAs strung together and to actual data, which I understand it naturally should. But now that everything is a character, I'm sure there is a systematic way of just removing all instances of NA from each cell/case? what command would be piped to do that for all the data? Edit: also thank you. this really gives me a good understanding of what paste does. – Michael Apr 25 '19 at 20:54