I am hoping someone out there can help. I am trying to figure out how to solve gap and island problems with pandas, but have been unsuccessful. For a sample data set I am using:
np.random.seed(1066)
dates = pd.date_range(start='2010-01-01', end='2010-12-31', freq='D')
df = pd.DataFrame({'date': dates,
'group': 'A',
'value': np.random.randint(0, 100, size=len(dates))
}).append(pd.DataFrame({'date': dates,
'group': 'B',
'value': np.random.randint(0, 100, size=len(dates))
})).append(pd.DataFrame({'date': dates,
'group': 'C',
'value': np.random.randint(0, 100, size=len(dates))
})).reset_index(drop=True)
length = df.shape[0]
droplist = np.unique(np.sort(np.random.randint(0, length, size=100))).tolist()
df = df.drop(droplist).reset_index(drop=True)
df
date group value
0 2010-01-01 A 57
1 2010-01-02 A 11
2 2010-01-03 A 83
3 2010-01-04 A 83
4 2010-01-05 A 93
... ... ... ...
992 2010-12-27 C 50
993 2010-12-28 C 59
994 2010-12-29 C 85
995 2010-12-30 C 32
996 2010-12-31 C 3
I would like to identify islands with breaks in dates of > 1 day to start start a new island.
Any help is appreciated!
Not sure where to start with this one.