1

I am using a panel data set:

  • y is my independent variable equal to 0 or 1 --> numeric
  • x1 are my individuals --> numeric
  • x2 are my time indicators --> numeric
  • x3,x4,...,x65 are my independent variables --> character

In the code below I convert all variables to characters and want to let R know that I am using panel data by the pdata.frame command on the last line. However, the problem now is that the command pdata.frame converts the variables x2 and x3 (the individuals and time indicator) to factors even when stringsAsFactors=FALSE.

#Regressions
df=read_excel("C:/Users/Luuk/Desktop/Master Thesis EME/Data/indep_dep_indlevel.xlsx")
df_dep=data.frame(df[,79])
count=as.data.frame(rep(1:3669, times=1, each=3))
df=cbind(count,df[,3:79])
df_indep=data.frame(df[,c(1:5,8,10:15,17:25,27:44,45,53:77)])
dflm=cbind(df_dep,df_indep)
dflm1 <- data.frame(lapply(dflm, as.character), stringsAsFactors=FALSE)

names(dflm1)[c(2:66)] <- c(paste("x", 1:65, sep=""))
names(dflm1)[1] <- "y"
dflm2=pdata.frame(dflm1,index=c("x1","x2"),stringsAsFactors=FALSE)

Consequently, the following pooled OLS model estimation gives the error:

Error in class(x) <- setdiff(class(x), "pseries") :
adding class "factor" to an invalid object In addition: Warning message: In model.response(mf, "numeric") : using type = "numeric" with a factor response will be ignored

xnam <- paste("x", 3:65, sep="")
Formula <- formula(paste("y ~ ", paste(xnam, collapse=" + ")))
fit=plm(Formula, data=dflm2,model="pooling")

How can I make my pooled OLS estimation procedure work?

Helix123
  • 3,502
  • 2
  • 16
  • 36
  • Making the index variables factors is a "feature" of `pdata.frame` - you cannot get around it. If you really want a pooled model on the numerical values of the variables which can be seen as the index variables of the panel data, just take `lm`. – Helix123 Jan 03 '20 at 16:32

1 Answers1

0

Running a panel model with pooling option is equivalent to just run a simple OLS lm(y~x). I don't see why would you need to make more than that if you are trying to obtain pooling estimates. I cannot reproduce your error, using similar steps to yours (see my code) using a xlsx file I created for this purpose. Please include a minimal working example that reproduces your error.

df=read_excel("~/Downloads/strtest.xlsx")
df_dep=data.frame(df[,1])
df2=data.frame(lapply(df, as.character), stringsAsFactors=FALSE)
names(df2)[c(1:3)] <- c(paste("x", 1:3, sep=""))
df2=cbind.data.frame(c(1,2,3,80),df2)
names(df2)[1]='y'
df3=pdata.frame(df2,index=c("x1","x2"),stringsAsFactors=FALSE)
plm(y~x1+x2+x3, data=df3,model="pooling")
Diegolog
  • 308
  • 1
  • 7