0

Complicating a previous question, lets say I have the following sock data.

>socks
year drawer  week  sock_total
1990  1       1        3                    
1990  1       2        4
1990  1       3        3 
1990  1       4        2 
1990  1       5        4
1990  2       1        1           
1990  2       2        1
1990  2       3        1
1990  2       4        1 
1990  2       5        2
1990  3       1        3
1990  3       2        4 
1990  3       3        4
1990  3       4        4
1990  3       5        4
1991  1       1        4
1991  1       2        3
1991  1       3        2
1991  1       4        2 
1991  1       5        3
1991  2       1        1           
1991  2       2        3
1991  2       3        4
1991  2       4        4
1991  2       5        3
1991  3       1        2           
1991  3       2        3
1991  3       3        3
1991  3       4        2
1991  3       5        3

How can I use summarise in dplyr to create a new variable growth which equals 1 if their was an increase in each week between the first year and the second year-- else 0. The data should look like this

>socks
 drawer  week growth 
  1       1        1        
  1       2        0   
  1       3        0   
  1       4        0   
  1       5        0   
  2       1        0        
  2       2        1   
  2       3        1   
  2       4        1   
  2       5        1   
  3       1        0   
  3       2        0   
  3       3        0   
  3       4        0   
  3       5        0

Also, how would you handle data where a drawer did not have a corresponding week in one of the years. aka add NA if a week was missing.

Alex
  • 2,603
  • 4
  • 40
  • 73
  • 2
    `socks %>% arrange(year) %>% group_by(drawer, week) %>% summarise(growth=ifelse(diff(sock_total) > 0, 1, 0))` – eipi10 Jul 30 '17 at 00:37
  • 2
    If a row is missing for a given `drawer`, `year` and `week`, you can add it back with `complete` from `tidyr` (`sock_total` will be filled with `NA` for these added rows, or you can set it to zero using the `fill` argument): `library(tidyr); socks %>% complete(year, nesting(drawer, week)) %>% arrange(year) %>% group_by(drawer, week) %>% summarise(growth=ifelse(diff(sock_total) > 0, 1, 0))` – eipi10 Jul 30 '17 at 00:48
  • @eipi10 On data with similar structure I get the error `Error in summarise_impl(.data, dots) : Column "growth" must be length 1 (a summary value), not 0` Do you know why this would be the case? I used your first solution. – Alex Jul 30 '17 at 01:35
  • @eipi10 I am guessing it is because of some missing values. The code you have in the second example is still running. If the problem is missing values your second line should fix the problem. – Alex Jul 30 '17 at 01:38

2 Answers2

1

The answer would be very similar to the previous, but group by drawer and week, comment by @eipi10 is also a great option; You can handle missing year for a specific drawer and week by using index after the subset, which turns a length zero object into NA:

For instance:

df %>% 
    group_by(drawer, week) %>% 
    summarise(growth = +(sock_total[year==1991][1] - sock_total[year==1990][1] > 0))
#                                              ^^^                         ^^^
# A tibble: 15 x 3
# Groups:   drawer [?]
#   drawer  week growth
#    <int> <int>  <int>
# 1      1     1      1
# 2      1     2      0
# 3      1     3      0
# 4      1     4      0
# 5      1     5      0
# 6      2     1      0
# 7      2     2      1
# 8      2     3      1
# 9      2     4      1
#10      2     5      1
#11      3     1      0
#12      3     2      0
#13      3     3      0
#14      3     4      0
#15      3     5     NA

The data has left out the year 1991 for drawer 3 and week 5:

structure(list(year = c(1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 
1991L, 1991L, 1991L, 1991L, 1991L), drawer = c(1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), week = c(1L, 2L, 3L, 4L, 
5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 
1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L), sock_total = c(3L, 4L, 3L, 
2L, 4L, 1L, 1L, 1L, 1L, 2L, 3L, 4L, 4L, 4L, 4L, 4L, 3L, 2L, 2L, 
3L, 1L, 3L, 4L, 4L, 3L, 2L, 3L, 3L, 2L)), .Names = c("year", 
"drawer", "week", "sock_total"), class = "data.frame", row.names = c(NA, 
-29L))
Psidom
  • 209,562
  • 33
  • 339
  • 356
1

Or you can try this without complete .

df%>%group_by(drawer,week)%>%
     summarise(growth =ifelse(n()<=1,0,ifelse((sock_total[1]-sock_total[2])>=0,0,1)))



# A tibble: 15 x 3
# Groups:   drawer [?]
   drawer  week growth
    <int> <int>  <dbl>
 1      1     1      1
 2      1     2      0
 3      1     3      0
 4      1     4      0
 5      1     5      0
 6      2     1      0
 7      2     2      1
 8      2     3      1
 9      2     4      1
10      2     5      1
11      3     1      0
12      3     2      0
13      3     3      0
14      3     4      0
15      3     5      0
BENY
  • 317,841
  • 20
  • 164
  • 234