0

I'm trying to do the average and correlation for some variables sorted gender. I don't think my group_by function is working, for some reason.

data(PSID1982, package ="AER" )

PSID1982 %>% 
  group_by(gender) %>% 
  summarise(avgeduc = mean(PSID1982$education), avgexper = mean(PSID1982$experience), avgwage= mean(PSID1982$wage),cor_wagvseduc = cor( x=PSID1982$wage, y= PSID1982$education))

The result is just the summary statistics of the entire group, not broken up into different genders.

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
  • 4
    Don't include `PSID1982$` in any calculation within `dplyr` verbs, just the column name. (When you do that, it always uses all of the original data, omitting any/all calculations or groupings you might have done earlier in this pipeline.) – r2evans Oct 05 '19 at 00:35
  • 1
    `... %>% summarise(avgeduc = mean(education), ...)` – r2evans Oct 05 '19 at 00:36
  • 1
    @camille, there's got to be a dupe for the use of `$` within `dplyr` functions, do you know? – r2evans Oct 05 '19 at 00:41
  • 2
    Honestly I'd recommend taking a step back to learn a bit about how `dplyr` functions work more broadly, which is generally by taking bare column names as @r2evans shows. A good starting point is in [R for Data Science](https://r4ds.had.co.nz/transform.html) – camille Oct 05 '19 at 01:14
  • Thanks for the comment. Made the changes; did not affect the outcome though. the data frame generated still is 1x4 with no male/female rows. – user12166883 Oct 05 '19 at 01:15
  • 1
    I ran your code (without `PSID1982$`) and it provided summary info for both genders. Please edit your question with what you are doing including your latest changes as well as output. – Ben Oct 05 '19 at 02:20
  • @user12166883 in that case you probably also have the problem of loading `plyr` after `dplyr`, ignoring the big warning that prints, and thus you are using `plyr::summarise()` instead of `dplyr::summarise`. You can specify `dplyr::summarise` to override this behavior. – Gregor Thomas Jan 26 '22 at 19:39

1 Answers1

0

Your syntax is correct but when you are using pipes and dplyr functions you do not need to call the column name using PSID1982$Column_Name. You just use the name of the column as follows:

PSID1982 %>% 
  group_by(gender) %>% 
  summarise(avgeduc = mean(education), 
            avgexper = mean(experience), 
            avgwage= mean(wage),
            cor_wagvseduc = cor( x=wage, y= education))
lovalery
  • 4,524
  • 3
  • 14
  • 28