Questions tagged [panel-data]

A multidimensional dataset usually describing measurements over time for a specific cohort.

Panel data is a dataset that is focused, multivariate longitudinal data for a set of cross-sectional units such as a family or an individual. Many statistical analysis libraries require the data to be formatted in a certain manner.

854 questions
4
votes
3 answers

Subsetting a unbalanced panel dataset to have at least 2 consecutive observations in R

I have an unbalanced panel dataset in R. The following will serve as an example: dt <- data.frame(name= rep(c("A", "B", "C"), c(3,2,3)), year=c(2001:2003,2000,2002,2000:2001,2003)) > dt name year 1 A 2001 2 A 2002 3 A…
Mace
  • 1,259
  • 4
  • 16
  • 35
4
votes
2 answers

Generating a lagged time series cross sectional variable in R

I am a new R user. I have a time series cross-sectional dataset and, although I have found ways to lag time series data in R, I have not found a way to create lagged time-series cross-sectional variables so that I can use them in my analysis.
Julie
  • 43
  • 1
  • 3
3
votes
2 answers

Create a cumulative count of events and retain first year before and after every event

I have a longitudinal dataset containing individuals along with information about where they are currently residing. The code below creates an example df: set.seed(123) df <- tibble( id = c(1, 2, 3, 4, 5, 1, 2, 3, 5, 6, 7, …
viktorp
  • 89
  • 7
3
votes
1 answer

Every participant has the same intercept and slope?

I'm having trouble understanding why my coef() call is returning the same intercept and slope for every participant in my data. For context, I am comparing two models (built in lmer) using the anova function. Model 1 is as follows model1 <- lmer(Pen…
jbrimm2004
  • 57
  • 3
3
votes
2 answers

Calculating the percentage of matching observation from one period to another in a panel data

I have a time-series panel dataset that is structured in the following way: There are multiple funds that each own multiple stocks and we have a value column for the stock. As you can see the panel is not balanced. My actual dataset is very large…
Erwin Rhine
  • 303
  • 2
  • 11
3
votes
2 answers

Panel data: Calculate group means while omitting first period from calculation

I have an issue regarding a certain kind of mean() calculation. I use a panel data set with two indentifiers "ID" and "year" (using the plm pkg) I want to calculate the groupwise mean of a variable "y", but omit the first year's entry of the…
tony13s
  • 141
  • 1
  • 1
  • 6
3
votes
1 answer

R plm vs fixest package - different results?

I'm trying to understand why R packages "plm" and "fixest" give me different standard errors when I'm estimating a panel model using heteroscedasticity-robust standard errors ("HC1") and state fixed effects. Does anyone have a hint for me? Here is…
minimouse
  • 131
  • 10
3
votes
1 answer

Specifying random effects for repeated measures in logistic mixed model in R: lme4::glmer

I am looking for feedback to determine how to correctly specify random effects to account for correlation in a repeated measures design, but with multiple levels of correlation (including the data being longitudinal for each combination of…
Meg
  • 696
  • 1
  • 7
  • 20
3
votes
3 answers

How do I create this variable in R?

Consider the following test data set using R: testdat<-data.frame("id"=c(rep(1,5),rep(2,5),rep(3,5)), "period"=rep(seq(1:5),3), "treat"=c(c(0,1,1,1,0),c(0,0,1,1,1),c(0,0,1,1,1)), …
dkro
  • 193
  • 1
  • 8
3
votes
2 answers

How to balance an unbalanced panel data?

Suppose I have the following unbalance pandel data: unbalanced.panel = structure(list(firm = c("A", "A", "A", "A", "B", "B", "A", "A", "B", "C", "C"), ind = c(1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1), year = c(2010, 2011, 2012, 2013, 2011, 2013, 2011,…
Cristhian
  • 361
  • 3
  • 12
3
votes
0 answers

In plm, unexpected difference between model = "within" on pre-differenced variable and model = "fd"?

I am fitting a diff-in-diff model on panel data with multiple treatment windows using the plm package. In the plm package, there are options to set: - model = "within" vs model = "fd" (first difference). Why dont the following produce equivalent…
3
votes
1 answer

What is the between variance formula in panel data?

I want to calculate panel descriptive statistics for my variables analogously to how Stata provides them using the "xtsum" function. I am able to compute almost everything (overall/within sd, mean, min, max) but I cannot seem to find a reliable…
user11005502
3
votes
2 answers

How to format data for panel data analysis in python?

I need to conduct time-series analysis on panel data. the data is currently formatted like the table below: +------+---------+---------+---------+---------+---------+---------+---------+---------+ | | Q1 | Q2 | Q3 | Q4 | …
3
votes
0 answers

Calculate correlation matrx for panel data in R

I have panel data set of firms: df <- structure(list(id = c("00127264", "00127264", "00127264", "00127264", "00127264", "00127264", "00127264", "00127264", "00127264", "00127264", "00127264", "00127264", "00127264", "00127264", "00127264",…
Mislav
  • 1,533
  • 16
  • 37
3
votes
3 answers

panel data: How to remove IDs with missing yearly information

I have a dataset with Id-year observations. I want to compare the change before and after/in 2015. Therefore I need all firms to have observations before and after/in 2015 so that I could compare. ID year diesese 1 2012 3 1 2016 4 3 2013 3 3…
Yufang
  • 103
  • 9