Questions tagged [panel-data]

A multidimensional dataset usually describing measurements over time for a specific cohort.

Panel data is a dataset that is focused, multivariate longitudinal data for a set of cross-sectional units such as a family or an individual. Many statistical analysis libraries require the data to be formatted in a certain manner.

854 questions
3
votes
1 answer

Drop variable in panel data in R conditional based on a defined number of consecutive observations

I am quite new to R, my problem is as follows: I have a set of panel data organised as time series like this (only part is shown): Week_Starting Team A Team B Team C Team D 2010-01-02 1 2 …
Jimmy
  • 59
  • 7
3
votes
1 answer

apply function to rolling window in panel data in R

I'm trying to apply a function (say standard deviation) in a rolling window, by category: I have the following data: cat = c("A", "A", "A", "A", "B", "B", "B", "B") year = c(1990, 1991, 1992, 1993, 1990, 1991, 1992, 1993) value = c(2, 3, 5, 6, 8,…
ec0n0micus
  • 1,075
  • 2
  • 12
  • 19
3
votes
2 answers

Simple moving average on an unbalanced panel in R

I am working with an unbalanced, irregularly spaced cross-sectional time series. My goal is to obtain a lagged moving average vector for the "Quantity" vector, segmented by "Subject". In other words, say the the the following Quanatities have been…
user27636
  • 1,070
  • 1
  • 18
  • 26
3
votes
2 answers

Time Trend Variable in Balanced Panel Data, Stata

I have some balanced panel data and want to include trend variable into my regression. However, I have 60 districts in 7 year time period and I am not sure how to include trend variable. Year variable is repetitive as expected and for 2005-2011. I…
user2624528
  • 31
  • 1
  • 1
  • 2
3
votes
4 answers

How to get the difference in value between subsequent observations (country-years)?

Let's say, I have scores for 5 countries over a period of 10 years such as: mydata<-1:3 mydata<-expand.grid( country=c('A', 'B', 'C', 'D',…
TiF
  • 615
  • 2
  • 12
  • 24
2
votes
2 answers

Create a matrix with a random number of observations for each group-period

I want to create a matrix for N groups and T time periods. For each combination of T-N, I want to have a random number of lines. The random number of lines for each N-T is given by round(runif(1,2,4)). The goal is to have as input…
yacx
  • 167
  • 2
  • 13
2
votes
2 answers

Quickly split a dataframe by year in R

I have a panel that looks like this country <- c("A","B","C","A","B","C","A","B","C") industry<- c("X","Y","Z","X","Y","Z","X","Y","Z") x2006<- sample(1000:100000,9) x2007<- sample(1000:100000,9) x2008<- sample(1000:100000,9) dat <- data.frame…
Gilrob
  • 93
  • 7
2
votes
1 answer

How to merge multiple data frames into panel data frame?

I can't imagine this question wasn't asked before, but I spend 2 hours of searching and didn't found anything. Let's suppose I have 5 separate data frames that contains the same four variables for different years. There is one common variable called…
Sam
  • 39
  • 5
2
votes
4 answers

Pandas: How to replace column values in panel dataset based on ID and condition

So I have a panel df that looks like this: ID year value 1 2002 8 1 2003 9 1 2004 10 2 2002 11 2 2003 11 2 2004 12 I want to set the value for every ID and for all years to the value in 2004. How do I do this? The df should…
jan.sfr
  • 33
  • 4
2
votes
1 answer

Interpreting results from linearmodels PanelOLS .predict() method

Suppose I have the following toy data: import pandas as pd from linearmodels.panel import PanelOLS y = pd.DataFrame( index=[[1, 1, 1, 2, 2, 2], [1, 2, 3, 1, 2, 3]], data=[70, 60, 50, 30, 33, 27], …
Chris
  • 199
  • 9
2
votes
1 answer

Pandas first difference panel data with multi index

I have two data frames with the same variables but from different years: df2016 = pd.DataFrame({"ID": [100,101,102,103], "A": [1,2,3,4], "B": [2,4,6,8], "year": [2016,2016,2016,2016]}) ID A B year 0 100 1 2 2016 1 101 2 4 …
Smithey
  • 327
  • 1
  • 9
2
votes
0 answers

Split issue with model_time and timetk in R

I'm using modeltime to forecast 20 time series (not balanced) at once using Modeltime package. However, when I call the function modeltime_calibrate i got the following error: Error in glubort(): ! Missing 'new_data'. Try adding a test data…
2
votes
1 answer

Fixed effect instrumental variable (IV) regression with available diagnostic tests

May I please know an R package and code to run fixed effect instrumental variable (IV) regression with available diagnostic tests (e.g., weak instrument test, exogeneity test (using Wu-Hausman), Sargan test)? I know plm code provides the fixed…
Eric
  • 528
  • 1
  • 8
  • 26
2
votes
1 answer

Predict on test data, using plm package in R, and calculate RMSE for test data

I built a model, using plm package. The sample dataset is here. I am trying to predict on test data and calculate metrics. # Import package library(plm) library(tidyverse) library(prediction) library(nlme) # Import data df <- read_csv('Panel data…
Anakin Skywalker
  • 2,400
  • 5
  • 35
  • 63
2
votes
2 answers

Estimating the percentage of common set members over time in a panel

I have a time-series panel dataset that is structured in the following way: There are 2 funds that each own different stocks at each time period. df <- data.frame( fund_id = c(1,1,1,1,1,1,1,1, 1, 2,2,2,2), time_Q = c(1,1,1,2,2,2,2,3, 3,…
Erwin Rhine
  • 303
  • 2
  • 11