4

I have a panel dataset where hospitals are followed over time from 2004 to 2010 every two years. The data is in Stata but I take it to R. Initially the variables year (2004, 2006, 2008, 2010) and t (1=2004, 2=2006 and so on) are in integer but later I convert them into factors as follows:

data$year <- factor(data$year)

and similarly for t time variable as well.

But I am confused and my question is as to whether take year or t as an integer or numeric variable or convert it to factor for the panel data and whether the above command is the right way to convert into a factor?

rcs
  • 67,191
  • 22
  • 172
  • 153
user3571389
  • 335
  • 1
  • 5
  • 10
  • In general, it should be a factor if it's a categorical variable. – Rich Scriven Oct 27 '14 at 03:57
  • 1
    If this is panel (longitudinal) data, then `year` and `t` are both numeric variables representing the passage of time, so I would have thought they should remain numeric, rather than factor, particularly if you're running the data through a repeated measures regression model. – eipi10 Oct 27 '14 at 05:54
  • A panel could be defined using the function `pdata.frame` of the package `plm`. This would define the time variable as one of the two indexes of the panel, being the other the observed subject, and treat it as a factor. details here https://cran.r-project.org/web/packages/plm/plm.pdf – Nemesi Sep 26 '18 at 16:05

1 Answers1

3

Treating year as a categorical variable will calculate effect of each indivudal year - i.e. what impact on the target variable was in average in a given year. On the other hand, including t as numerical variable says what happens on average two years later. Given that there are just 4 time periods, the first approach seems more reasonable, but it really depends on the goal of our analysis.

The command should be

data$year <- as.factor(data$year).

Also, make sure that You include only one of year or t as including both could screw up the interpretation.

Love-R
  • 798
  • 6
  • 18
  • Sorry for the late response, but thank you. In the end since its a short panel we have treated year as a categorical variable. – user3571389 Sep 18 '15 at 17:28