0

I'm currently using the "weightloss" dataset from the datarium package to start running an RMANOVA. Here is the dput:

dput(head(weightloss))
structure(list(id = structure(1:6, .Label = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12"), class = "factor"), 
    diet = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("no", 
    "yes"), class = "factor"), exercises = structure(c(1L, 1L, 
    1L, 1L, 1L, 1L), .Label = c("no", "yes"), class = "factor"), 
    t1 = c(10.43, 11.59, 11.35, 11.12, 9.5, 9.5), t2 = c(13.21, 
    10.66, 11.12, 9.5, 9.73, 12.74), t3 = c(11.59, 13.21, 11.35, 
    11.12, 12.28, 10.43)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

So this is the script I have come up with so far:

# Create Data Frame for Dataset:

weight <- weightloss
weight

# Pivot Longer Data to Create Factors and Scores:

weight <- weight %>% 
  pivot_longer(names_to = 'trial', # creates factor (x)
               values_to = 'value', # creates value (y)
               cols = t1:t3) # finds which cols to factor

# Plot Means in Boxplot:

ggplot(weight,
       aes(x=trial,y=value))+
  geom_boxplot()+
  labs(title = "Trial Means") # As can be predicted, inc w/time

I get this pretty normal looking boxplot:

Boxplot

Now its time to find outliers and test for normality.

# Identify Outliers (Should be None Given Boxplot):
    
    outlier <- weight %>% 
      group_by(trial) %>% 
      identify_outliers(value)
    outlier_frame <- data.frame(outlier) 
    outlier_frame # none found :)

# Normality (Shapiro-Wilk and QQPlot):

model <- lm(value~trial,
            data = weight) # creates model
shapiro_test(residuals(model)) # measures Shapiro
ggqqplot(residuals(model))+
  labs(title = "QQ Plot of Residuals") # creates QQ

This again gives me a pretty normal QQplot:

QQPLOT

I then wrapped the data by trial:

ggqqplot(weight, "value", ggtheme = theme_bw())+
  facet_wrap(~trial)+
labs(title = "QQPlot of Each Trial") #looks normal

And it comes out right from what I can tell:

QQPLOT FACETED

However, when I try to do a Shapiro Wilk test by group, I keep having issues with this code:

shapiro_group <- weight %>%
  group_by(trial) %>%
  shapiro_test(value)

It gives me this error:

Error: Problem with mutate() column data. i data = map(.data$data, .f, ...). x Must group by variables found in .data.

  • Column variable is not found.

I also tried this:

shapiro_test(weight, trial$value)

And get this error instead:

Error: Can't subset columns that don't exist. x Column trial$value doesn't exist.

If anybody has some insight as to why, I would greatly appreciate it!

Shawn Hemelstrand
  • 2,676
  • 4
  • 17
  • 30

1 Answers1

1

The reason you were getting an error for shapiro_test was because the implementation of it has this one line in it.

shapiro_test
function (data, ..., vars = NULL) 
{
....
....
 data <- data %>% gather(key = "variable", value = "value") %>% 
        filter(!is.na(value))
....
....
}

where it gets the data in long format using gather. Since you already have a column named value this doesn't work.

If you change the name of value column to anything else it works.

library(dplyr)
library(rstatix)

weight %>%
  rename(value1 = value) %>%
  group_by(trial) %>%
  shapiro_test(value1)

#  trial variable statistic     p
#  <chr> <chr>        <dbl> <dbl>
#1 t1    value1       0.869 0.222
#2 t2    value1       0.910 0.440
#3 t3    value1       0.971 0.897
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you so much. That was exactly it. I ended up just changing the original name in my original pivot_longer command and then changing everything to "score" to keep everything consistent in my script. Appreciate it Ronak! – Shawn Hemelstrand Sep 11 '21 at 04:03