Please forgive me because I'm extremely new to R Studio, so I'd appreciate if you could help via pointing me to documentation or something of the sort.
I have a data frame called GSS
that has very, very many unlabeled rows and two columns labeled COLOR
and STAGE
, among other irrelevant columns. STAGE
is comprised of random values between 1 and 100, while COLOR
can only be between 1 and 4. I have also created a factor FCOL
, which categorizes and enumerates the frequency of values in COLOR
where 1=Red, 2= Blue, 3=Green, and 4=Yellow.
I'd like to create a table that organizes the means and medians of the values in STAGE
sharing the same color. I attempted to do so with this:
stats <- GSS %>%
group_by(COLOR) %>%
summarize(mean_stage = mean(STAGE),
median_stage = median(STAGE))
This successfully calculates the mean and median stage per color, as shown by stats$mean_stage
and stats$median_stage
producing the right values in the expected color order, but running table(stats)
produces a very difficult to read series of tables full of 1s and 0s that seem to give no indication of which color corresponds to which number. Ideally, I'd like to be able to group them by their level in my factor FCOL
, where I have a column with "Red, Blue, Green, Yellow" next to a column with the corresponding means and a column with the corresponding values. However, writing group_by(FCOL)
gives me an error telling me that Column 'FCOL' is unknown.
How can I create this table the way I want? I've been doing lots of searching, but I can't seem to find anything that explains how to connect my data frame back to the factor I've already created. I'm using the libraries tidyverse and dplyr.