3

The R4DS book has the following code block:

library(tidyverse)
by_age2 <- gss_cat %>%
  filter(!is.na(age)) %>%
  count(age, marital) %>%
  group_by(age) %>%
  mutate(prop = n / sum(n))

Is there a simple equivalent to this code in base R? The filter can be replaced with gss_cat[!is.na(gss_cat$age),], but after that I run in to trouble. It's clearly a job for by, tapply, or aggregate, but I've not been able to find the right way. by(gss_2, with(gss_2, list(age, marital)), length) is a step in the right direction, but the output is awful.

J. Mini
  • 1,868
  • 1
  • 9
  • 38
  • Try `proportions(table(subset(gss_cat, complete.cases(age), select = c(age, marital))))` – akrun Jun 10 '21 at 17:54

1 Answers1

3

We could use proportions on the table output after subsetting to remove the NA (complete.cases) and selecting the columns

The data is from forcats package. So, load the package and get the data

library(forcats)
data(gss_cat)

Use the table/proportions as mentioned above

by_age2_base <- proportions(table(subset(gss_cat, complete.cases(age), 
       select = c(age, marital))), 1)

-output

head(by_age2_base, 3)
    marital
age    No answer Never married   Separated    Divorced     Widowed     Married
  18 0.000000000   0.978021978 0.000000000 0.000000000 0.000000000 0.021978022
  19 0.000000000   0.939759036 0.000000000 0.012048193 0.004016064 0.044176707
  20 0.000000000   0.904382470 0.003984064 0.007968127 0.000000000 0.083665339

-compare with the OP's output

head(by_age2, 3)
# A tibble: 3 x 4
# Groups:   age [2]
    age marital           n   prop
  <int> <fct>         <int>  <dbl>
1    18 Never married    89 0.978 
2    18 Married           2 0.0220
3    19 Never married   234 0.940 

If we need the output in 'long' format, convert the table to data.frame with as.data.frame

by_age2_base_long <- subset(as.data.frame(by_age2_base), Freq > 0)

Or another option is aggregate/ave (use R 4.1.0)

subset(gss_cat, complete.cases(age), select = c(age, marital)) |> 
    {\(dat) aggregate(cbind(n = age) ~ age + marital, 
      data = dat, FUN = length)}() |> 
   transform(prop = ave(n, age, FUN = \(x) x/sum(x)))
akrun
  • 874,273
  • 37
  • 540
  • 662