0

Here is a sample data as data2:

lvl x y 0 20.099 21.2 100 21.133 21.4 250 20.866 21.6 500 22.679 21.8 750 22.737 22.1 0 30.396 32.0 100 31.373 32.1 250 31.303 32.2 500 33.984 32.8 750 44.563 38.0 0 22.755 18.5 100 23.194 18.8 250 23.263 20.5 500 23.061 27.9 750 25.678 36.4

I tried to get the rmse and r2 for each level (lvl) by the following lines of codes: data2 %>% group_by(lvl) %>% summarise_each(funs(rmse(data2$x~data2$y))) and summary(lm(data2$x,data2$y))$r.squared respectively, and I got the following error message when calculating rmse:

Error: argument "obs" is missing, with no default

and

# A tibble: 5 x 3 lvl x y <int> <dbl> <dbl> 1 0 0.6639888 0.6639888 2 100 0.6639888 0.6639888 3 250 0.6639888 0.6639888 4 500 0.6639888 0.6639888 5 750 0.6639888 0.6639888

when calculating r2.

I wanted to aggregate the rmse and r2 for each level. In this case I have only 5 levels.So the answer will look like 5 rows X 3 columns with column names `"lvl","rmse","r2" Thank you in advance.

G1124E
  • 407
  • 1
  • 10
  • 20

2 Answers2

2

You don't need summarise_each summarise will do what you want. If you prefer using dplyr here is a solution

data2 <-
data.frame(
  lvl = c(  0, 100, 250, 500, 750, 0, 100, 250, 500, 750, 0, 100, 250, 500, 750)
  ,x = c(
    20.099, 21.133, 20.866, 22.679, 22.737, 30.396, 31.373, 31.303, 33.984, 44.563, 22.755, 23.194, 23.263, 23.061, 25.678
  )
  ,y = c(21.2, 21.4, 21.6, 21.8, 22.1, 32.0, 32.1, 32.2, 32.8, 38.0, 18.5, 18.8, 20.5, 27.9, 36.4)
)

#install.packages("ModelMetrics")
library(ModelMetrics)

data2 %>%
  group_by(lvl) %>%
  summarise(
    RMSE = rmse(x, y)
    ,R2 = cor(x, y)^2
  )

## A tibble: 5 × 3
#    lvl     RMSE        R2
#  <dbl>    <dbl>     <dbl>
#1     0 2.701237 0.8176712
#2   100 2.575982 0.8645350
#3   250 1.729888 0.9091029
#4   500 2.920640 0.7207692
#5   750 7.267279 0.4542507
JackStat
  • 1,593
  • 1
  • 11
  • 17
1
## split your data2 into a list by the levels of the factor and then use lapply
list_of_rsquared <- lapply(split(data2, data2$lvl), function (z) {
  summary(lm(x ~ y, data = z))$r.squared
}
)

## you will get a list of r.squared for each level . Now you can simply rbind the list of r.squared.
rsquared_vals <- do.call("rbind", list_of_rsquared)

You can use the same approach for RMSE. (I am assuming you have written a function called RMSE? because I am just using the formula you have above)

list_of_rmse <- lapply(split(data2, data2$lvl), function (z) { sqrt(mean((z$x - z$y)^2)) } )

rmse_vals <- do.call("rbind", list_of_rmse)

you can just cbind all three columns you need now:

cbind(data2$lvl, rsquared_vals, rmse_vals)
vagabond
  • 3,526
  • 5
  • 43
  • 76
  • Thank you very much. The rmse one didn't work. It gave me error message: `Error in match(class(obs), c("integer", "numeric", "ts", "zoo")) : argument "obs" is missing, with no default`. Can you please tweak it. – G1124E Nov 16 '16 at 22:47
  • can you share your rmse function please? – vagabond Nov 16 '16 at 22:48
  • I tried `data2 %>% group_by(lvl) %>% summarise_each(funs(rmse))` but it didn't work. – G1124E Nov 16 '16 at 22:53
  • there is no function `rmse` predefined in `r`. read this: https://www.rforge.net/doc/packages/hydroGOF/rmse.html to learn how to calculate RMSE and write a function for it first ! – vagabond Nov 16 '16 at 23:05
  • Yes, I saw that. It had the predefined `rmse` with the hydroGOF package. That is why I tried to use the `rmse` function. I also tried `list_of_rmse <- lapply(split(data2, data2$lvl), function (z) { z<-sqrt(mean(data2$x - data2$y)^2) } )` but I got a list of anwers: `$`0` [1] 0.0144 $`100` [1] 0.0144 $`250` [1] 0.0144 $`500` [1] 0.0144 $`750` [1] 0.0144` Can you please tweak to get the final result. Thank you – G1124E Nov 16 '16 at 23:26
  • well you can just do : `list_of_rmse <- lapply(split(data2, data2$lvl), function (z) { sqrt(mean(z$x - z$y)^2) } ) ` but i warn you - you are misinterpreting what RMSE is. The Error in RMSE = Actual - Predicted. You are just subtracting variable y from variable x and squaring that . It makes no sense. Please read and understand what RMSE means: https://heuristically.wordpress.com/2013/07/12/calculate-rmse-and-mae-in-r-and-sas/ – vagabond Nov 16 '16 at 23:35
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/128308/discussion-between-g1124e-and-vagabond). – G1124E Nov 16 '16 at 23:43
  • You can edit the answer by adding a closing bracket in the first line of the codes for split function just before starting the function. – G1124E Nov 17 '16 at 00:54
  • done ! and upvote the answer if it has worked out for you – vagabond Nov 17 '16 at 02:19