1

Excuse my naiveté. I'm not sure what this type of model is called -- perhaps panel regression.

Imagine I have the following data:

n <- 100
x1 <- rnorm(n)
y1 <- x1 * 0.5 + rnorm(n)/2

x2 <- rnorm(n)
y2 <- x2 * 0.5 + rnorm(n)/2

x3 <- rnorm(n)
y3 <- x3 * 0.25 + rnorm(n)/2

x4 <- rnorm(n)
y4 <- x4 * 0 + rnorm(n)/2

x5 <- rnorm(n)
y5 <- x5 * -0.25 + rnorm(n)/2

x6 <- rnorm(n)
y6 <- x6 * -0.5 + rnorm(n) + rnorm(n)/2

x7 <- rnorm(n)
y7 <- x7 * -0.75 + rnorm(n)/2

foo <- data.frame(s=rep(1:100,times=7),
                  y=c(y1,y2,y3,y4,y5,y6,y7),
                  x=c(x1,x2,x3,x4,x5,x6,x7),
                  i=rep(1:7,each=n))

Where y and x are individual AR1 time series measured over 100 seconds (I use 's' instead of 't' for the time variable) divided equally into groups (i). I wish to model these as:

y_t= b_0 + b_1(y_{t-1}) + b_2(x_{t}) + e_t

but while taking the group (i) into account:

y_{it)= b_0 + b_1(y_{it-1}) + b_2(x_{it}) + e_{it}

I wish to know if b_2 (the coef on x) is a good predictor of y and how that coef varies with group. I also want to know the R2 and RMSE by group and to predict y_i given x_i and i. The grouping variable can be discrete or continuous.

I gather that this type of problem is called panel regression but it is not a term that is familiar to me. Is using plm in R a good approach to investigate this problem?

Based on the comment below, I guess this is a simple start:

require(dplyr)
require(broom)
fitted_models <- foo %>% group_by(grp) %>% do(model = lm(y ~ x, data = .))
fitted_models %>% tidy(model)
fitted_models %>% glance(model)
user111024
  • 723
  • 3
  • 15
  • Your notation is ambiguous. You don't write b_{i1}, but talk about b_1 variability across groups. I believe you want to estimate a different b_1 for each group. Then the question is what is common across groups in your model? For now it looks that nothing is. In that case there is no need for `plm` and `lm` would suffice. – Julius Vainora Mar 12 '18 at 03:47
  • Apologies. I want to know about the coef on x (b_2). E.g., whether that variable changes with group (i) and how. As far as I understand the issues, the series y_i and x_i are individual samples (not remeasured in time) but correlated. – user111024 Mar 12 '18 at 05:29
  • Then we have b_{i2}. And what about the autoregressive term? Do you think that the coefficient is the same across groups? Also, how many groups are there? – Julius Vainora Mar 12 '18 at 12:14
  • The AR1 could be the same across groups. It's likely close. There are about 100 groups and the time series is about 100 seconds long. – user111024 Mar 12 '18 at 17:38

1 Answers1

0

Since you don't include fixed or random effects in the model, we are dealing with the pooled OLS (POLS) which can be estimated using lm or plm.

Let's construct example data of 100 groups and 100 observations for each:

df <- data.frame(x = rnorm(100 * 100), y = rnorm(100 * 100), 
                 group = factor(rep(1:100, each = 100)))
df$ly <- unlist(tapply(df$y, df$group, function(x) c(NA, head(x, -1))))
head(df, 2)
#            x          y group         ly
# 1  1.7893855  1.2694873     1         NA
# 2  0.8671304 -0.9538848     1  1.2694873

Then

m1 <- lm(y ~ ly + x:group, data = df)

is a model with a common autoregressive coefficient and a group-dependent effect of x:

head(coef(m1)[-1:-2], 5)
#    x:group1    x:group2    x:group3    x:group4    x:group5 
# -0.02057244  0.06779381  0.04628942 -0.11384630  0.06377069 

This allows you to plot them, etc. I suppose one thing that you will want to do is to test whether those coefficients are equal. That can be done as follows:

m2 <- lm(y ~ ly + x, data = df)
library(lmtest)
lrtest(m1, m2)
# Likelihood ratio test
#
# Model 1: y ~ ly + x:group
# Model 2: y ~ ly + x
#   #Df LogLik  Df  Chisq Pr(>Chisq)
# 1 103 -14093                      
# 2   4 -14148 -99 110.48     0.2024

Hence, we cannot reject that the effects of x are the same, as expected.

Julius Vainora
  • 47,421
  • 9
  • 90
  • 102