Losing R factor organization when summarizing the data (dplyr)

Question

I'm trying to summarize a numeric response variable (above ground biomass [AGB]) by several categorical factors as well as date as a part of a larger project. The date is being read as a character and is being organized as 4/10/2020, 4/8/2020, 4/9/2020. Additionally, there is a column, Shoot.Plot, that is numbered 1-11 being ordered: 1, 10, 11, 2... and so on since it's being read as a character string (which is fine for the most part asides from the strange order). I've releveled the factors to what I want, but when I summarize the data using either get_summary_stats() from the rstatix() package or using summarize(), the levels organization is lost.

Here's what I've tried:

df %>% 
  mutate(Date.Coll, factor(Date.Coll, levels = c("4/8/2020","4/9/2020","4/10/2020")), 
         Shoot.Plot, factor(Shoot.Plot, levels = 
                              c("1","2","3","4","5","6","7","8","9","10","11"))) %>%
  group_by(Date.Coll, Site, Eelgrass, Oyster, Shoot.Plot) %>%
  filter(is.na(BGB),
         Date.Coll=="4/8/2020" | Date.Coll=="4/9/2020" | Date.Coll=="4/10/2020") %>% 
  select(AGB) %>% 
  get_summary_stats(type="mean_se")

When I check the data frame right before the get_summary_stats() line, the data is organized as I specified in the mutate function. Only after summarizing do both those go out the window.

Any suggestions? Thank you!

Please provide enough code so others can better understand or reproduce the problem. — Community, Sep 26 '21 at 13:14

score 0 · Answer 1 · answered Sep 22 '21 at 23:23

0

When you put select(AGB), you are removing all other columns in your data frame. If you want to specify which variable to summarize, put it in get_summary_stats. Without the data, I can't check if it works. Try this:

df %>% 
  mutate(Date.Coll, factor(Date.Coll, levels = c("4/8/2020","4/9/2020","4/10/2020")), 
         Shoot.Plot, factor(Shoot.Plot, levels = c("1","2","3","4","5","6","7","8","9","10","11"))) %>%
  group_by(Date.Coll, Site, Eelgrass, Oyster, Shoot.Plot) %>%
  filter(is.na(BGB),
         Date.Coll=="4/8/2020" | Date.Coll=="4/9/2020" | Date.Coll=="4/10/2020") %>% 
  get_summary_stats(AGB, type="mean_se")

answered Sep 22 '21 at 23:23

John Franchak

89
3

I tried this along with including all the factors in the `group_by()` function in the `select()` function, and I got the same result. I ended up working around this by merging this to another df (which was the ultimate goal) by the listed factors, essentially making row order unimportant, but still. I'd love to know why this happens. – Kyra McClelland Sep 24 '21 at 15:13
I'd be happy to try it out if you can post some of the data to test on. Otherwise it's hard to diagnose what's going on. – John Franchak Sep 25 '21 at 16:03

Losing R factor organization when summarizing the data (dplyr)

1 Answers1