-1

I am working on another aspect of my project but using a csv I downloaded from Mesowest for my experiment. The new code is

df_pirates_all = pd.read_csv(
    "https://cdn.touringplans.com/datasets/pirates_of_caribbean_dlr.csv",usecols=['date','datetime','SPOSTMIN'],
    parse_dates=['date', 'datetime'], 
)
df_pirates_all['ride'] = 'pirates'
df_pirates_all['open'] = ~((df_pirates_all['SPOSTMIN'] == -999))

df_pirates = df_pirates_all.set_index('datetime').sort_index()
df_pirates = df_pirates.loc['2017-01-01 06:00':'2017-02-01 00:00']
c = df_pirates.groupby(level=0).transform("count")
c[c["date"]>1].index.tolist()
df_pirates = df_pirates[~df_pirates.index.isin(c[c["date"]>1].index.tolist())].resample('10Min').fillna("nearest",limit=1)

wxdataadd="C:/Users/stratus/Downloads/DisneyJanuary2017Wx.csv"

wx=pd.read_csv(wxdataadd)
wxdata=wx.resample('10Min')


temp=wxdata['air_temp_set_1']
time=wxdata['Date_Time']

wxtest=pd.concat([df_pirates, temp,time])
wxtest=wxtest.set_index([df_pirates,temp]).unstack()


print (wxtest)

However I am getting a

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'

with respect to the resampling of the weather data as the wxdata has values in columns:

Index(['Date_Time', 'air_temp_set_1', 'relative_humidity_set_1',
       'wind_speed_set_1', 'wind_direction_set_1', 'wind_gust_set_1',
       'precip_accum_since_local_midnight_set_1'],
      dtype='object')

which are every 5 minutes when I want to set it to every 10 or even 15 to line up nicely with the ride data.

Here is the first few lines of wx: Excel Sample However, I know that the times in the wx does not line up with the one for pirates.

1 Answers1

3
  • The issue occurs because .resample requires a datetime index
    • .resample also needs some type of aggregation function at the end, such as .sum() or .mean().
    • For this data, since you are loosing resolution by resampling from 5 minute to 10 minute intervals, it would make more sense to use .mean() as the aggregation function
  • There may be other issues, but this is causing the TypeError.
import pandas as pd

wxdataadd="C:/Users/stratus/Downloads/DisneyJanuary2017Wx.csv"

# convert Date_Time to datetime when reading file
wx = pd.read_csv(wxdataadd, parse_dates=['Date_Time'])

# set Date_Time to the index
wx.set_index('Date_Time', inplace=True)

# resample the weather data from 5 minutes to 10 minutes, using mean as the agg function
wx = wx.resample('10Min').mean()  # there needs to be an agg function

# join the air temp data to df_pirates on the datetime index
wxtest = df_pirates.join(wx.air_temp_set_1)

# this is an example of the wxtest dataframe
# display(wxtest.head())

                     SPOSTMIN     ride  open  air_temp_set_1
2017-01-01 09:10:00       5.0  pirates  True       51.539172
2017-01-01 09:20:00       5.0  pirates  True       42.690527
2017-01-01 09:30:00       5.0  pirates  True       46.879581
2017-01-01 09:40:00       5.0  pirates  True       41.588848
2017-01-01 09:50:00       5.0  pirates  True       45.575068

Sample Data

import pandas as pd
import numpy as np
from datetime import datetime

# sample weather data in 5 minutes intervals
np.random.seed(45)
wxdata = {'Date_Time': pd.bdate_range(datetime(2017, 1, 1), freq='5Min', periods=10), 'air_temp_set_1': np.random.uniform(40, 55, size=(10,))}

wx = pd.DataFrame(wxdata)
wx.Date_Time = wx.Date_Time + pd.Timedelta(hours=9, minutes=10)

# display(wx.head())
                     air_temp_set_1
Date_Time                          
2017-01-01 09:10:00       54.835173
2017-01-01 09:15:00       48.243171
2017-01-01 09:20:00       44.221710
2017-01-01 09:25:00       41.159343
2017-01-01 09:30:00       46.667042

# pirates data in 10 minutes intervals
pirates = {pd.Timestamp('2017-01-01 09:10:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True},
           pd.Timestamp('2017-01-01 09:20:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True},
           pd.Timestamp('2017-01-01 09:30:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True},
           pd.Timestamp('2017-01-01 09:40:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True},
           pd.Timestamp('2017-01-01 09:50:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True}}

df_pirates = pd.DataFrame.from_dict(pirates, orient='index')

# display(df_pirates)
                     SPOSTMIN     ride  open
2017-01-01 09:10:00       5.0  pirates  True
2017-01-01 09:20:00       5.0  pirates  True
2017-01-01 09:30:00       5.0  pirates  True
2017-01-01 09:40:00       5.0  pirates  True
2017-01-01 09:50:00       5.0  pirates  True
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158