0

Please forgive me because I'm extremely new to R Studio, so I'd appreciate if you could help via pointing me to documentation or something of the sort.

I have a data frame called GSS that has very, very many unlabeled rows and two columns labeled COLOR and STAGE, among other irrelevant columns. STAGE is comprised of random values between 1 and 100, while COLOR can only be between 1 and 4. I have also created a factor FCOL, which categorizes and enumerates the frequency of values in COLOR where 1=Red, 2= Blue, 3=Green, and 4=Yellow.

I'd like to create a table that organizes the means and medians of the values in STAGE sharing the same color. I attempted to do so with this:

stats <- GSS %>%
  group_by(COLOR) %>%
  summarize(mean_stage = mean(STAGE),
            median_stage = median(STAGE))

This successfully calculates the mean and median stage per color, as shown by stats$mean_stage and stats$median_stage producing the right values in the expected color order, but running table(stats) produces a very difficult to read series of tables full of 1s and 0s that seem to give no indication of which color corresponds to which number. Ideally, I'd like to be able to group them by their level in my factor FCOL, where I have a column with "Red, Blue, Green, Yellow" next to a column with the corresponding means and a column with the corresponding values. However, writing group_by(FCOL) gives me an error telling me that Column 'FCOL' is unknown.

How can I create this table the way I want? I've been doing lots of searching, but I can't seem to find anything that explains how to connect my data frame back to the factor I've already created. I'm using the libraries tidyverse and dplyr.

Phil
  • 7,287
  • 3
  • 36
  • 66
  • Please show a small reproduciible example with `dput` and expected output Please check the column names of `GSS` with `colnames(GSS)` The `stats` dataset only have the 'COLOR' and the `mean_stage`, `median_stage` columns and not 'FCOL' If you want to create these new columns, then use `mutate` instead of `summarise` – akrun Feb 07 '20 at 23:58
  • You should look at `stats` directly, not `table(stats)`. It will have four rows with the stats for the four colors. To use `FCOL` instead of `COLOR`, add it as a column in `GSS` and `group_by(FCOL)`. – Kent Johnson Feb 08 '20 at 00:24
  • @KentJohnson You're right, `print.data.frame(stats)` makes it display how I expect it. How do I add `FCOL` to `GSS`? That sounds like it'll solve my problem. I'd prefer to have it replace `COLOR` if possible. – user11500789 Feb 08 '20 at 00:35
  • @KentJohnson never mind, I figured it out. Thank you for your help. – user11500789 Feb 08 '20 at 00:41

1 Answers1

0

Formatting issue was fixed by replacing table(stats) with print.data.frame(stats). I then replaced COLOR with FCOL with gss <- gss %>% mutate(COLOR = FCOL).