0

Exactly like this question but how do you also get the R squared value for each model? link

Sample data

test <- data.frame(row=c(1:16),
plot = c(1,1,1,1,1,2,2,2,3,3,3,3,3,3,3,3),
                 logT = c(1.092,1.091,1.0915,1.09,1.08,1.319,1.316,1.301,1.2134,1.213,1.21,1.22,1.23,1.20,1.19,1.19),
                 utc_datetime = c(2020-03-05T00:00:00Z,2020-03-05T00:30:00Z,2020-03-05T01:00:00Z,2020-03-05T01:30:00Z,2020-03-05T02:00:00Z, 2020-03-06T01:00:00Z,2020-03-06T01:30:00Z,2020-03-06T02:00:00Z,
2020-03-10T02:00:00Z,2020-03-10T02:30:00Z,2020-03-10T03:00:00Z,2020-03-10T03:30:00Z,2020-03-10T04:00:00Z,2020-03-10T04:30:00Z,2020-03-10T05:00:00Z,2020-03-10T05:30:00Z,), 
hrs_since = 1,2,3,4,5,1,2,3,1,2,3,4,5,6,7,8))

A deeper explanation of the data I am dealing with is here but I believe the sample data provided above would suffice data. Ideally, I would want to use the utc_datetime as the x axis/IV value but no code I've tried works with using that so I created the hrs_since variable which works.

I am looking for an output datframe that looks something like this:

plot slope(coeff) r2 value rsd
1 2.1 .96 .01
2 1.3 .85 .01
3 .8 .99 .02

When I run the code below...

output <- ddply(test, "plot", function(x) {
  model <- lm(logT ~ hrs_since, data = x)
  coef(model)
})

I create a dataframe that looks like this:

plot (Intercept) hrs_since
1 2.1 .96
2 1.3 .85
3 .8 .99

But when I add summary(model)$r.squared to it, such as below...

output <- ddply(test, "plot", function(x) {
  model <- lm(logT ~ hrs_since, data = x)
  coef(model)
  summary(model)$r.squared
})

I create a dataframe that looks like this:

plot V1
1 0.98
2 0.97
3 0.89

Where the correct R squared value has been added as column V1 to the df "output", but I have for some reason lost the coeff column? Ideally, I want to also add rsd and maybe st.dev columns but have not attempted yet because getting the R squared and coeff columns correct are the most important parameters I need. Also, originally I tried using r.squared(model) instead of summary(model)$r.squared in the line below coef(model), but this resulted in getting the error "Error in UseMethod("pmodel.response") : no applicable method for 'pmodel.response' applied to an object of class "lm""

Also, I tried a method using this code as well and it worked but the coeff was not returned in the parameters returned for each plot

output <- test %>%
  group_by(plot) %>%
  do(glance(lm(lnT~hrs_since, data=.)))

Thank you in advance!

achtee
  • 11
  • 3

1 Answers1

0

Here's an approach that nests each of the models in a dataframe and captures the results in the dataframe as well. Then uses the broom package to extract the statistics. There are two different broom functions that extract the intercept and r2, so I run them separately and combine into one dataframe.

library(dplyr)
library(modelr)
library(tidyverse)

dat_all <- data.frame()

#nest the datasets as separate dataframes
for (p in unique(test$plot)){
    data <- data.frame(x = test$logT[test$plot == p], y = test$hrs_since[test$plot == p])
    names(data) <- c("logT", "hrs_since")
    dd <- data.frame(plot = p, data = data) %>%
        group_by(plot) %>%
        nest()
    dat_all <- rbind(dat_all, dd)
}



myModel <- function(x){
    lm(data.logT ~ data.hrs_since, data = x)
}

#use map to run the model and each of the nested dataframes
dat_all <- dat_all %>%
    mutate(model = map(data, myModel))

#extract the intercepts
i <- dat_all %>% 
    mutate(tidy = map(model, broom::tidy)) %>% 
    unnest(tidy) %>%
    filter(term == "(Intercept)") %>%
    select(plot, intercept = estimate)

#extract r2
r <- dat_all %>% 
    mutate(glance = map(model, broom::glance)) %>% 
    unnest(glance) %>%
    select(plot, r.squared)

#combine statistics by plot
results <- i %>%
    left_join(r, by = "plot")

   plot intercept r.squared
  <dbl>     <dbl>     <dbl>
1     1      1.10     0.618
2     2      1.33     0.871
3     3      1.22     0.380
stomper
  • 1,252
  • 1
  • 7
  • 12
  • Hi thank you so much for providing this explanation and code it is extremely helpful! however I was hoping to get out the slope coefficient of the linear model instead of the intercept - is there any simple switch in the code you provided to get the slope coefficient of the model rather than the intercept? – achtee Jan 31 '23 at 00:56
  • Run dat_all %>% mutate(tidy = map(model, broom::tidy)) %>% unnest(tidy) to see what broom::tidy() makes available and dat_all %>% mutate(glance = map(model, broom::glance)) %>% unnest(glance) to see what broom::glance() makes available. There is also broom::augment(). Then use select() to isolate the stats you are looking for. – stomper Jan 31 '23 at 01:34