2

I have a panel data set with individual (dyad_id), which is an integer, and time (year_month) which is a Date variable. I try running the following code:

df.fe <- plm(deaths_civilians ~ deaths_a_lag + deaths_b_lag, 
                         data = rebel, 
                         index = c("dyad_id", "year_month"), 
                         model = "within", 
                         effect = "individual")

but I keep getting the following error message:

Error in pdim.default(index[[1]], index[[2]]) : 
  duplicate couples (id-time)
In addition: Warning messages:
1: In pdata.frame(data, index) :
  duplicate couples (id-time) in resulting pdata.frame
 to find out which, use e.g. table(index(your_pdataframe), useNA = "ifany")
2: In is.pbalanced.default(index[[1]], index[[2]]) :
  duplicate couples (id-time)

3: In is.pbalanced.default(index[[1]], index[[2]]) :
  duplicate couples (id-time)

All previous answers to this question say that it is because I have more than one observation with the same ID for the same time period, but I have checked and this is not the case. I have tried transforming both the ID and year_month into different types of variables (factors, integers etc) but nothing works.

I cannot really provide any reproducible data that would help diagnose the problem because my final dataset is the result of merging about 6 separate datasets, and about 300 lines of code. However, would anybody be able to suggest a potential reason for this problem and any remedies?

Lee Tagziria
  • 53
  • 2
  • 10
  • 1
    Did you carefully look at `table(index(your_pdataframe), useNA = "ifany")` as suggested (where you create the pdata.frame first)? – Helix123 Aug 15 '17 at 14:00
  • 1
    Yep. I scanned the output manually first and they are all 1s, and then used View(table(index(your_pdataframe), useNA = "ifany")) and sorted by size, and that confirmed that they are all 1s. I do not have a single 'id' with more than one observation for the same time period. – Lee Tagziria Aug 15 '17 at 15:21
  • Have you also checked once you converted the index variables to e.g. integers? If you don't trust you sorting procedure and your manual inspection, you can use `any(table(index(Produc), useNA = "ifany") > 1)` and see if that returns `FALSE`. – Helix123 Aug 17 '17 at 20:23
  • My index variables are both factors. I could convert the id index to an integer, but my time index is an as.Date(as.yearmon) variable, and as such would have difficulty converting it whilst maintaining the correct values/information of the variable would I not? – Lee Tagziria Aug 18 '17 at 10:25
  • Is your time index a factor or a Date variable (before running `pdata.frame()`) on your data? You make both statements above... if it is a Date, try to converting it to integer. pdata.frames use factors internally for the index variables and I suspect the casting from Date to factor is the cause of your problem. – Helix123 Aug 18 '17 at 10:54
  • Sorry I confused myself. The time index was a factor and that produced the error, so I converted to as.yearmon, and that still produced the error. So I transformed it back into a factor and tried to convert it to integer using as.numeric and as.integer and both resulted in random values e.g. 1989-10-01 turned into 7, the following month turned into 24 etc – Lee Tagziria Aug 18 '17 at 11:02
  • So, use `table()` on your original two index variables to find out if there are duplicates. – Helix123 Aug 18 '17 at 13:25

1 Answers1

2

I had the same error. Make sure you are putting the name of your panel data when running the plm regression. If you put the name of your ould data set (the one that is not set to be panel) it will give you this error.