How to resample a csv so it can perfectly align with my other csv

Question

I am working on another aspect of my project but using a csv I downloaded from Mesowest for my experiment. The new code is

df_pirates_all = pd.read_csv(
    "https://cdn.touringplans.com/datasets/pirates_of_caribbean_dlr.csv",usecols=['date','datetime','SPOSTMIN'],
    parse_dates=['date', 'datetime'], 
)
df_pirates_all['ride'] = 'pirates'
df_pirates_all['open'] = ~((df_pirates_all['SPOSTMIN'] == -999))

df_pirates = df_pirates_all.set_index('datetime').sort_index()
df_pirates = df_pirates.loc['2017-01-01 06:00':'2017-02-01 00:00']
c = df_pirates.groupby(level=0).transform("count")
c[c["date"]>1].index.tolist()
df_pirates = df_pirates[~df_pirates.index.isin(c[c["date"]>1].index.tolist())].resample('10Min').fillna("nearest",limit=1)

wxdataadd="C:/Users/stratus/Downloads/DisneyJanuary2017Wx.csv"

wx=pd.read_csv(wxdataadd)
wxdata=wx.resample('10Min')


temp=wxdata['air_temp_set_1']
time=wxdata['Date_Time']

wxtest=pd.concat([df_pirates, temp,time])
wxtest=wxtest.set_index([df_pirates,temp]).unstack()


print (wxtest)

However I am getting a

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'

with respect to the resampling of the weather data as the wxdata has values in columns:

Index(['Date_Time', 'air_temp_set_1', 'relative_humidity_set_1',
       'wind_speed_set_1', 'wind_direction_set_1', 'wind_gust_set_1',
       'precip_accum_since_local_midnight_set_1'],
      dtype='object')

which are every 5 minutes when I want to set it to every 10 or even 15 to line up nicely with the ride data.

Here is the first few lines of wx: However, I know that the times in the wx does not line up with the one for pirates.

Trenton McKinney · Accepted Answer · 2020-07-28T20:34:46.343

The issue occurs because .resample requires a datetime index
- .resample also needs some type of aggregation function at the end, such as .sum() or .mean().
- For this data, since you are loosing resolution by resampling from 5 minute to 10 minute intervals, it would make more sense to use .mean() as the aggregation function
There may be other issues, but this is causing the TypeError.

import pandas as pd

wxdataadd="C:/Users/stratus/Downloads/DisneyJanuary2017Wx.csv"

# convert Date_Time to datetime when reading file
wx = pd.read_csv(wxdataadd, parse_dates=['Date_Time'])

# set Date_Time to the index
wx.set_index('Date_Time', inplace=True)

# resample the weather data from 5 minutes to 10 minutes, using mean as the agg function
wx = wx.resample('10Min').mean()  # there needs to be an agg function

# join the air temp data to df_pirates on the datetime index
wxtest = df_pirates.join(wx.air_temp_set_1)

# this is an example of the wxtest dataframe
# display(wxtest.head())

                     SPOSTMIN     ride  open  air_temp_set_1
2017-01-01 09:10:00       5.0  pirates  True       51.539172
2017-01-01 09:20:00       5.0  pirates  True       42.690527
2017-01-01 09:30:00       5.0  pirates  True       46.879581
2017-01-01 09:40:00       5.0  pirates  True       41.588848
2017-01-01 09:50:00       5.0  pirates  True       45.575068

Sample Data

import pandas as pd
import numpy as np
from datetime import datetime

# sample weather data in 5 minutes intervals
np.random.seed(45)
wxdata = {'Date_Time': pd.bdate_range(datetime(2017, 1, 1), freq='5Min', periods=10), 'air_temp_set_1': np.random.uniform(40, 55, size=(10,))}

wx = pd.DataFrame(wxdata)
wx.Date_Time = wx.Date_Time + pd.Timedelta(hours=9, minutes=10)

# display(wx.head())
                     air_temp_set_1
Date_Time                          
2017-01-01 09:10:00       54.835173
2017-01-01 09:15:00       48.243171
2017-01-01 09:20:00       44.221710
2017-01-01 09:25:00       41.159343
2017-01-01 09:30:00       46.667042

# pirates data in 10 minutes intervals
pirates = {pd.Timestamp('2017-01-01 09:10:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True},
           pd.Timestamp('2017-01-01 09:20:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True},
           pd.Timestamp('2017-01-01 09:30:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True},
           pd.Timestamp('2017-01-01 09:40:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True},
           pd.Timestamp('2017-01-01 09:50:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True}}

df_pirates = pd.DataFrame.from_dict(pirates, orient='index')

# display(df_pirates)
                     SPOSTMIN     ride  open
2017-01-01 09:10:00       5.0  pirates  True
2017-01-01 09:20:00       5.0  pirates  True
2017-01-01 09:30:00       5.0  pirates  True
2017-01-01 09:40:00       5.0  pirates  True
2017-01-01 09:50:00       5.0  pirates  True

How to resample a csv so it can perfectly align with my other csv

1 Answers1

Sample Data