0

I'm working through the exercises at the end of each chapter of McNulty's Handbook of Regression Modeling in People Analytics. I'm currently stuck on problem number nine from the end of the chapter two, which reads:

Using the pipe operator, write code to find the mean of the Yr1 test scores for all those who achieved Yr3 test scores greater than 100. Round this mean to the nearest integer.

I've attempted several different approaches and scoured Stack Overflow for new strategies but am coming up short.

 filter(ugtests, Yr3 > 100)%>%
  colMeans(ugtests[1], na.rm = TRUE) %>%
  round(digits = 0) #Error in colMeans(., ugtests[1], na.rm = TRUE) : invalid 'dims'

 filter(ugtests, Yr3 > 100)%>%
  mean(ugtests$Yr1) %>%
  round(digits = 0) #Warning message:In mean.default(., ugtests$Yr1) : argument is not numeric or `logical: returning NA

filter(ugtests, Yr3 > 100)%>%
  mean() %>%
  round(digits = 0)#Warning message: In mean.default(.) : argument is not numeric or logical: returning NA

2 Answers2

0

This book is making an unusual choice to use %>% without using dplyr. If you use subset() as in the book examples, getting a single column with select as well as the rows you want, your code will work.

## example from the book 
## adapt this to your solution
## using subset(), not filter()!
subset(salespeople$sales, subset = salespeople$sales < 500) %>% # get the subsetted data
  mean() %>% # take the mean value
  round() # round to the nearest integer

Alternately, if you would like to use dplyr functions like filter(), you would probably do it like this:

## exactly analogous to above - extract the column, then average and round it
 filter(ugtests, Yr3 > 100) %>%
  pull(Yr1) %>%  ## extract the column from the data frame
  mean() %>%
  round(digits = 0)

## more common with dplyr - keep things in the data frame:
 filter(ugtests, Yr3 > 100) %>%
   summarize(mean_yr1 = round(mean(Yr1)))
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • He does instruct use of dplyr with the issue being my code is producing a one column data frame - and mean() only works on vectors. – Seth Saavedra Dec 14 '20 at 21:48
  • Does he instruct on dplyr? He mentions it in the text once, and then assigns questions on it. But, in that section at least, it seems like there are no code examples of `dplyr`, and it is only mentioned in the text once. – Gregor Thomas Dec 14 '20 at 21:55
  • Ah, good point. He instructs installation of dplyr but not much else beyond that. Based on this question he added language about pull() into the question yesterday. – Seth Saavedra Dec 15 '20 at 15:53
  • Well... in the next question there's mention of `pull`. I can understand not wanting to create a new `dplyr` tutorial when several very good ones exist, but I'd expect them to actually link to something, like the [Introduction to dplyr](https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames/24376207#24376207) vignette, and recommend reading that before getting frustrated on an unexplained task. The core concepts of `dplyr` - working on data frames with `dplyr` verbs, really should be explained before asking students to work with it. – Gregor Thomas Dec 15 '20 at 16:00
0

Try this:

ugtests %>% filter(Yr3 > 100) %>% summarise(M=round(mean(Yr1)))
Marcos Pérez
  • 1,260
  • 2
  • 7