-2

I would like to calculate and plot changing numbers of differently colored animals over time using dplyr and ggplot2.

I have observations of different animals on random dates and so I would first like to group those observations into 4-day brackets and then calculate mean color for each 4-day bracket. I created the column Bracket.mean with a gimmick result for the first few just to show what I have in mind. I would like to add those means in the same data frame (as opposed to creating a new data.frame or vectors) for a later analysis and plotting, if possible.

And for the plot I’m hoping to show the bracket means with some measure of variance around it (SD or boxplots) as well as the daily observations (perhaps a faded overlay of the observations in the background) over time.

Below is a part of the dataset I'm using (with a made up 'Bracket.mean' column I’m hoping to calulcate). 'Count' is the number of animals on a given 'Date' of a specific 'Color'.

    Date    Julian  Count   Color   Bracket.color
4/19/16 110 1   50  mean of 4/19-4/22
4/19/16 110 1   50  mean of 4/19-4/22
4/19/16 110 1   100 mean of 4/19-4/22
4/20/16 111 4   50  mean of 4/19-4/22
4/20/16 111 1   0   mean of 4/19-4/22
4/20/16 111 2   100 mean of 4/19-4/22
4/20/16 111 1   50  mean of 4/19-4/22
4/20/16 111 2   100 mean of 4/19-4/22
4/21/16 112 1   100 mean of 4/19-4/22
4/21/16 112 2   50  mean of 4/19-4/22
4/21/16 112 4   50  mean of 4/19-4/22
4/21/16 112 1   100 mean of 4/19-4/22
4/21/16 112 2   50  mean of 4/19-4/22
4/21/16 112 1   0   mean of 4/19-4/22
4/22/16 113 2   0   mean of 4/19-4/22
4/22/16 113 4   50  mean of 4/23-4/26
4/23/16 114 6   0   mean of 4/23-4/26
4/23/16 114 1   50  mean of 4/23-4/26
4/24/16 115 2   0   mean of 4/23-4/26
4/26/16 117 5   0   mean of 4/23-4/26
4/30/16 121 1   50  
5/2/16  123 1   NA  
5/2/16  123 1   50  
5/7/16  128 2   0   
5/7/16  128 3   0   
5/7/16  128 3   0   
5/8/16  129 4   0   
5/8/16  129 1   0   
5/10/16 131 1   50  
5/10/16 131 4   50  
5/12/16 133 1   0   
5/13/16 134 1   50  
5/14/16 135 1   0   
5/14/16 135 2   50  
5/14/16 135 2   0   
5/14/16 135 1   0   
5/17/16 138 1   0   
5/17/16 138 2   0   
5/23/16 144 1   0   
5/24/16 145 4   0   
5/24/16 145 1   0   
5/24/16 145 1   0   
5/27/16 148 3   NA  
5/27/16 148 1   0   
5/27/16 148 1   50  

Any help would be greatly appreciated. Thanks very much in advance!

Kestrel1
  • 163
  • 1
  • 2
  • 8

1 Answers1

0

Something like this should get you started.

library(dplyr)
df <- df %>% mutate(Date = as.Date(Date, format='%m/%d/%y'),
                    Start = as.Date(cut(Date, breaks= seq(min(Date), max(Date)+4, by = 4)))) %>%
    mutate(End = Start+3) %>%
    group_by(Start,End) %>%
    summarise(meanColor = mean(Color, na.rm=T),
              sdColor = sd(Color, na.rm=T))
df
#Source: local data frame [10 x 4]
#Groups: Start [?]
#        Start        End meanColor  sdColor
#        <date>     <date>     <dbl>    <dbl>
#1  2016-04-19 2016-04-22  56.25000 35.93976
#2  2016-04-23 2016-04-26  12.50000 25.00000
#3  2016-04-27 2016-04-30  50.00000       NA
#4  2016-05-01 2016-05-04  50.00000       NA
#5  2016-05-05 2016-05-08   0.00000  0.00000
#6  2016-05-09 2016-05-12  33.33333 28.86751
#7  2016-05-13 2016-05-16  20.00000 27.38613
#8  2016-05-17 2016-05-20   0.00000  0.00000
#9  2016-05-21 2016-05-24   0.00000  0.00000
#10 2016-05-25 2016-05-28  25.00000 35.35534

Then plot using,

library(ggplot)
ggplot(df) + geom_line(aes(Start,meanColor))
Karthik Arumugham
  • 1,300
  • 1
  • 11
  • 18
  • Great, thanks very much! Two more things please: 1. Can you also incorporate into the 4-day mean calculations the fact that multiple animals were recorded that were of a particular 'Color' stage on certain occasions? -- as given in the 'Count' column (e.g., on 'Date' 4/20 there were 5 (=4+1) animals of 'Color' type 50). 2. Can the new 'Start' and 'meanColor' variables be added to the same d.f. as the original 'Color' observations so I can plot both the 'meanColor' and the individual 'Color' observations (and if possible show all observations if 'Count' > 1) in the same plot? Thanks! – Kestrel1 Feb 19 '17 at 23:21
  • Add `Color` to the `group_by()` to get the additional dimension. If you want the data to be added to the original dataframe, then replace `summarise()` with `mutate()` – Karthik Arumugham Feb 20 '17 at 02:00
  • Great, the mutate works! But as for my additional question #1: As opposed to having each row represent one individual animal observation I have a count of animals that were observed in that color. Sometimes it was just one individual (Count=1), but sometime it was more (e.g. on 4/22 there were 4 different individuals that had the color 50: Count=4). So I'm trying to include the 'Count' variable into the calculation of mean (the command group_by(Start, End) is fine). Do I need to copy and paste three more identical rows of the 4/22 observation etc. in order to fix it or is there another way?Thx – Kestrel1 Feb 20 '17 at 02:54