I have time series data with each year's data stored in different columns. This data is known to have some data errors which are causing minor holes in the time series data that I'd like to correct.
For reference, the data looks like this
df = pd.DataFrame({'Firm': ['A', 'B', 'C', 'D', 'E', 'F'],
'2003': [1, 1, 1, 0, 0, 1],
'2004': [1, 0, 1, 0, 0, 1],
'2005': [1, 1, 0, 0, 0, 0],
'2006': [0, 1, 1, 0, 0, 1],
'2007': [1, 0, 1, 1, 0, 1],
'2008': [1, 0, 1, 1, 0, 0]})
I'd like to write a programmatic way to fill the random zeroes that occasionally pop up. Essentially, if a firm has a 1 for 2 years, with a 0 in between in them, I'd like to assume that is a data error and fill in the data. The desired output for this sample data would look like this:
df = pd.DataFrame({'Firm': ['A', 'B', 'C', 'D', 'E', 'F'],
'2003': [1, 1, 1, 0, 0, 1],
'2004': [1, 1, 1, 0, 0, 1],
'2005': [1, 1, 1, 0, 0, 1],
'2006': [1, 1, 1, 0, 0, 1],
'2007': [1, 0, 1, 1, 0, 1],
'2008': [1, 0, 1, 1, 0, 0]})
If I need to reshape this longways and format it as a time series, I will. First I wanted to see if there was an easy to do this in this wide format though.
Thanks!