1

I have a dataset I initially manipulate with the gather() function. I am now attempting to create averages of groups in the gathered data. I am having issues understanding the best way to create averages of the data provided here. My hope is to create an average associated to each group. Here I am averaging scores for 'observers'.

EDIT: I need an average for each observer over all dates of observation.

EDIT-2: Each observer has any number of individuals they will be assessing. If I use group_by(observer) the average will be over all observations total, not an average for the observer.

EDIT-3: I am hoping to see averages of each observation dates 'fidelity score'. If I have 3 scores (90,100,120), I would like to see an average of these values attributed to the observer, but still be able to display the scores over time. The output I am hoping for would be:

enter image description here

Important Note: My fidelity scores are all out of 129 possible points

EDIT-4: I would like to average observer scores over the count to observations(date_of_observation)

Here is the function I am using to create my averages.

LPLC_Group %>%
  group_by(observer,date_of_observation)%>%
  summarize(fidelity_score = sum(value,na.rm=TRUE),
        average_fidelity = round(mean(fidelity_score,na.rm=TRUE),2))

The following dput is related to the output of the function above. I cannot post my full dataset. The output of this function should be enough to work with.

dput output:

structure(list(observer = c("Cristianne", "Cristianne", "Cristianne", 
"Deb", "Deb", "Deb", "Lori", "Lori", "Lori", "Pauline", "Pauline", 
"Pauline"), date_of_observation = c("6/24/19", "7/24/19", "8/24/19", 
"6/24/19", "7/24/19", "8/24/19", "6/24/19", "7/24/19", "8/24/19", 
"6/24/19", "7/24/19", "8/24/19"), fidelity_score = c(100L, 87L, 
95L, 89L, 106L, 98L, 85L, 104L, 102L, 94L, 85L, 113L), average_fidelity = c(100, 
87, 95, 89, 106, 98, 85, 104, 102, 94, 85, 113)), row.names = c(NA, 
-12L), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), groups = structure(list(
    observer = c("Cristianne", "Deb", "Lori", "Pauline"), .rows = list(
        1:3, 4:6, 7:9, 10:12)), row.names = c(NA, -4L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE))
Slyme
  • 75
  • 2
  • 16
  • What is your expected – akrun Jul 01 '19 at 16:27
  • I need an average for each observer over all dates of observation. – Slyme Jul 01 '19 at 16:33
  • 1
    You want an average for each observer, so use `group_by(observer)` – Gregor Thomas Jul 01 '19 at 16:35
  • The date of observation is key for the visualization I am making. group_by works, but then the time value is gone. Thoughts? @Gregor – Slyme Jul 01 '19 at 16:40
  • 2
    Please show the output you want corresponding to your sample inputs. The statement *"If I use `group_by(observer)` the average will be over all observations total, not an average for the observer"* seems completely wrong. If you use `group_by(observer)`, you will get one row per observer, with the average value for all observations for each observer---aka "an average for the observer". This seems to be what you describe, but not what you want. So please show what you want as the description is failing. – Gregor Thomas Jul 01 '19 at 16:47
  • Added some clarification @Gregor – Slyme Jul 01 '19 at 17:03
  • 1
    "*but still be able to display the scores over time*", ah, new key information. Use `group_by(observer)`, but use `mutate()` instead of `summarize()`. `mutate` adds columns to existing data, `summarize` summarizes the data to 1-row-per-group. – Gregor Thomas Jul 01 '19 at 17:05
  • @Gregor the alteration you have recommended, does not alter the output I am seeing. – Slyme Jul 01 '19 at 17:09

1 Answers1

1
library(dplyr)
LPLC_Group %>%
  group_by(observer) %>%
  mutate(average_fidelity = mean(fidelity_score))
# A tibble: 12 x 4
# Groups:   observer [4]
   observer   date_of_observation fidelity_score average_fidelity
   <chr>      <chr>                        <int>            <dbl>
 1 Cristianne 6/24/19                        100             94  
 2 Cristianne 7/24/19                         87             94  
 3 Cristianne 8/24/19                         95             94  
 4 Deb        6/24/19                         89             97.7
 5 Deb        7/24/19                        106             97.7
 6 Deb        8/24/19                         98             97.7
 7 Lori       6/24/19                         85             97  
 8 Lori       7/24/19                        104             97  
 9 Lori       8/24/19                        102             97  
10 Pauline    6/24/19                         94             97.3
11 Pauline    7/24/19                         85             97.3
12 Pauline    8/24/19                        113             97.3

If the output you get does not match mine for this input, then you have probably succumbed to the mistake of Loading plyr after dplyr and ignoring the warning. I would suggest restarting R and being careful to load plyr before dplyr (if at all).

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294