Stuck on an assignment in understanding how to resample the data into an hourly rate instead of the 4 hour blocks they are now. Here is what has been asked to do.
Differences The ENTRIES and EXITS fields hold raw counts that do not reset to zero each week. We would like to know how many entries and exits there are in the 4-hour periods. To calculate this, we need to calculate the difference between neighboring rows that have the same (UNIT, C/A, SCP) key. Create NUM_ENTRIES and NUM_EXITS columns that store these numbers.
Hints:
The shift method will be useful. It will be easier to use groupby when doing the shift as it will respect boundaries between subunits. The level argument will help define the subunits. Most of the counters count up, but there are some that count down. How should you handle those cases? Fix this for extra credit.
One problem with the numbers from the previous question is that they are sampled at different times. Resample the ENTRIES and EXITS columns to an hourly rate and interpolate it to fill in the missing values. Use the "pchip" interpolation method as it will preserve monotonicity. Again, this should be done in groups using groupby, but the apply function will allow the use of arbitrary interpolate methods. Now, recompute the NUM_ENTRIES and NUM_EXITS columns from Part 2. Hints: Use reset_index to clear the UNIT, C/A, and SCP levels of the index as this makes the resample and interpolate methods used in the apply function more straightforward. Add the index back after performing the interpolation via set_index.
here is code I have up to the part it is asked to resample
df = pd.read_csv("turnstile_161126.txt")
timestamp =pd.to_datetime(df['DATE'] + ' ' + df['TIME'])
df.insert(3, 'TIMESTAMP', timestamp)
df.columns = df.columns.str.strip()
df = df.set_index(['UNIT','C/A','SCP','TIMESTAMP'])
df['NUM_ENTRIES'] = df.ENTRIES - df.ENTRIES.shift(1)
df['NUM_EXITS'] = df.EXITS - df.EXITS.shift(1)
STATION LINENAME DIVISION DATE TIME DESC ENTRIES EXITS NUM_ENTRIES NUM_EXITS
UNIT C/A SCP TIMESTAMP
R051 A002 02-00-00 2016-11-19 03:00:00 59 ST NQR456W BMT 11/19/2016 03:00:00 REGULAR 5924658 2007780 NaN NaN
2016-11-19 07:00:00 59 ST NQR456W BMT 11/19/2016 07:00:00 REGULAR 5924672 2007802 14.0 22.0
2016-11-19 11:00:00 59 ST NQR456W BMT 11/19/2016 11:00:00 REGULAR 5924738 2007908 66.0 106.0
2016-11-19 15:00:00 59 ST NQR456W BMT 11/19/2016 15:00:00 REGULAR 5924979 2007980 241.0 72.0
2016-11-19 19:00:00 59 ST NQR456W BMT 11/19/2016 19:00:00 REGULAR 5925389 2008056 410.0 76.0
2016-11-19 23:00:00 59 ST NQR456W BMT 11/19/2016 23:00:00 REGULAR 5925614 2008081 225.0 25.0
2016-11-20 03:00:00 59 ST NQR456W BMT 11/20/2016 03:00:00 REGULAR 5925684 2008096 70.0 15.0
2016-11-20 07:00:00 59 ST NQR456W BMT 11/20/2016 07:00:00 REGULAR 5925688 2008113 4.0 17.0
2016-11-20 11:00:00 59 ST NQR456W BMT 11/20/2016 11:00:00 REGULAR 5925755 2008191 67.0 78.0
2016-11-20 15:00:00 59 ST NQR456W BMT 11/20/2016 15:00:00 REGULAR 5925937 2008260 182.0 69.0
2016-11-20 19:00:00 59 ST NQR456W BMT 11/20/2016 19:00:00 REGULAR 5926232 2008332 295.0 72.0
2016-11-20 23:00:00 59 ST NQR456W BMT 11/20/2016 23:00:00 REGULAR 5926394 2008367 162.0 35.0
2016-11-21 03:00:00 59 ST NQR456W BMT 11/21/2016 03:00:00 REGULAR 5926425 2008378 31.0 11.0
2016-11-21 07:00:00 59 ST NQR456W BMT 11/21/2016 07:00:00 REGULAR 5926440 2008420 15.0 42.0
2016-11-21 11:00:00 59 ST NQR456W BMT 11/21/2016 11:00:00 REGULAR 5926622 2008741 182.0 321.0
2016-11-21 15:00:00 59 ST NQR456W BMT 11/21/2016 15:00:00 REGULAR 5926872 2008851 250.0 110.0
2016-11-21 19:00:00 59 ST NQR456W BMT 11/21/2016 19:00:00 REGULAR 5927775 2008927 903.0 76.0