1

I have a problem with my panel data regression. The dataset shows a balanced panel consisting of n= 10, T = 26, N= 260. However, once I start my regression I get an unbalanced panel and the n declines to 7. I am assuming that it is because I do have NAs in my dataset. Does anybody have an idea how I can get rid of this problem?

africapd <- pdata.frame(africa, index = c("Country", "Years"))
smpl <- africapd[africapd$country]
tbl <- xtable(smpl)
kable(tbl, digits = 4, align = "c", caption = "Sample")
pdim(africapd)

africapd %>% 
  is.pbalanced()



Balanced Panel: n = 10, T = 26, N = 260 [1] TRUE


ols_gdp <- plm(GDP_panel ~  TA_panel+ LE_panel + Infl_panel + Fert_panel+ LED_panel+ GC_panel + ROL_panel + TOT_panel + PG_panel+ HC_panel+ LDE_panel, data = africapd, index = c("Country", "Years"),effect = "individual", model = "pooling")
ols_gdp %>% is.pbalanced()
summary(ols_gdp)

Unbalanced Panel: n = 7, T = 1-10, N = 41

Please let me know if you have an idea how to fix this. Also, please tell me how to do it in R since I am relatively new to this.

Thanks in advance

Mimi
  • 11
  • 3
  • I believe is due to your NAs values, is better find a way to deal with missing data. If possible, please provide a little reproducible example of your dataset. – LucaCoding May 25 '21 at 12:33
  • @LucaCoding https://1drv.ms/x/s!AloV-jC6qfKna-xj6CeAs2m5CF0 here is a link to my excel file, do you have any idea how to deal with missing data? Should I get rid of it? – Mimi May 25 '21 at 13:57
  • Dear @Mimi, You cannot expect people will go throw your excel file, and download it. Read this for find a useful way to make reproducible examples https://stackoverflow.com/questions/49994249/example-of-using-dput?noredirect=1&lq=1. Furthermore, try to check Web sources for dealing with missing data (https://www.youtube.com/watch?v=An7nPLJ0fsg). – LucaCoding May 25 '21 at 14:31
  • Btw: this is documented in `?plm::pdim` (same explanation applies for `is.pbalanced`): Calling pdim on an estimated panelmodel object and on the corresponding (p)data.frame used for this estimation does not necessarily yield the same result. When called on an estimated panelmodel, the number of observations (individual, time) actually used for model estimation are taken into account. When called on a (p)data.frame, the rows in the (p)data.frame are considered, disregarding any NA values in the dependent or independent variable(s) which would be dropped during model estimation. – Helix123 May 25 '21 at 21:18
  • @LucaCoding thanks for the help, I really appreciate it! Also sorry, but I am new to this and I do not know how it is usually done so I thought it would be easiest if I would give you access to my excel file, I did not expect you to actually go through it. I just thought it would give you an idea about what I am dealing with – Mimi May 26 '21 at 07:50

0 Answers0