0

I have a data frame (datos) that has eight columns and 2006 observations:

datos

from these columns I want to calculate MSE for Pcp_Estacion and Pcp_Chirps, using the MSE function of the MLmetrics library, but I want to calculate them per station and per month to obtain a data frame calculated for each month and each weather station, in the example I calculate the MSE for the five weather stations I have for the month of July

# load libraries
library(tidyverse);
library(dplyr);
library(Metrics);
library(MLmetrics);

# See the first 10 data
dput(head(datos, 10))

    X Mes Year Estacion variable  n Pcp_Chirps Pcp_Estacion
1   1   1 1982    11024      Pcp 30      0.262        0.000
2   2   1 1982    11033      Pcp 31      0.190        0.045
3   3   1 1982    11141      Pcp 31      0.265        0.000
4   4   2 1982    11024      Pcp 28      0.317        0.286
5   5   2 1982    11033      Pcp 28      0.242        0.629
6   6   2 1982    11141      Pcp 28      0.351        0.500
7   7   3 1982    11024      Pcp 31      0.000        2.903
8   8   3 1982    11033      Pcp 31      0.148        0.000
9   9   3 1982    11141      Pcp 31      0.000        0.000
10 10   4 1982    11024      Pcp 30      0.543        0.800

# Calculate the July mse() for each weather station
mse_11024_7 <- filter(datos, Mes == 7, Estacion %in% c("11024"))
mse_11033_7 <- filter(datos, Mes == 7, Estacion %in% c("11033"))
mse_11060_7 <- filter(datos, Mes == 7, Estacion %in% c("11060"))
mse_11096_7 <- filter(datos, Mes == 7, Estacion %in% c("11096"))
mse_11141_7 <- filter(datos, Mes == 7, Estacion %in% c("11141"))

# check the result
mse(mse_11024_7$Pcp_Estacion, mse_11024_7$Pcp_Chirps)
mse(mse_11033_7$Pcp_Estacion, mse_11033_7$Pcp_Chirps)
mse(mse_11060_7$Pcp_Estacion, mse_11060_7$Pcp_Chirps)
mse(mse_11096_7$Pcp_Estacion, mse_11096_7$Pcp_Chirps)
mse(mse_11141_7$Pcp_Estacion, mse_11141_7$Pcp_Chirps)

is there a faster way to do all this at once, for all months and weather stations ?

Here the example data https://drive.google.com/drive/folders/19h7u0GzGO1okjhO3RLREY0QKY8DOoTy-?usp=sharing

  • Can you add your data in a reproducible form? Try editing your question to put the output of `dput(head(datos, 100))` to give us your first 100 lines – jpsmith Aug 01 '22 at 20:17

1 Answers1

0

You’ll have a better chance of getting a helpful answer if you provide a minimal reproducible example. However, the following should give you what you need:

library(dplyr)

datos %>%
  filter(Mes == 7) %>%
  group_by(Estacion) %>%
  summarise(Mse = mse(Pcp_Chirps))

The approach is to partition (i.e. ‘group’) the data by values of Estacion before using summarise() to compute the mse for each group.

wurli
  • 2,314
  • 10
  • 17
  • Thank you very much for the advice and help dear wurli, the example you give me helped me a lot, I modified it and it gave me the mse by month, season and year, thank you very much, here is the code p_mensual_mse <- datos %>% group_by(Year, Mes, Estacion) %>% summarise(mse = MSE(Pcp_Estacion, Pcp_Chirps)) – El Memo de Mileto Aug 01 '22 at 22:10
  • It is my first question on stackoverflow, I will follow your advice, thank you very much dear wurli. – El Memo de Mileto Aug 01 '22 at 22:11