0

I have a DataFrame that features two time columns. I want to reindex or create a new DataFrame with an index that covers the total time of those columns and I want the times in those columns to be used to label rows in the reindexed or new DataFrame. How might this be done efficiently?

df = pd.read_csv("NBER_chronology.csv")# 
http://www.nber.org/cycles/NBER%20chronology.xlsx
df["Peak month"] = pd.to_datetime(df["Peak month"])
df["Trough month"] = pd.to_datetime(df["Trough month"])
df.tail()

The index could be defined in the following way:

pd.date_range(start = df["Trough month"].values[0], end = df["Trough month"].values[-1])

However, I feel that taking this route is a bit too manual.

Brian
  • 2,163
  • 1
  • 14
  • 26
BlandCorporation
  • 1,324
  • 1
  • 15
  • 33

1 Answers1

0

This sets the index as the difference between Trough dates as 'timespan'. Oddly does it in days, I bet you could change that if you wanted.

df = pd.read_csv("NBER_chronology.csv")#
#http://www.nber.org/cycles/NBER%20chronology.xlsx
df["Peak month"] = pd.to_datetime(df["Peak month"])
df["Trough month"] = pd.to_datetime(df["Trough month"])

# date of next row
after_df = pd.DataFrame(df['Trough month'].iloc[1:])
after_df.reset_index(inplace=True, drop=True)
df['after'] = after_df

# column that contains difference 
df['timespan'] = df['after'] - df["Trough month"]
df['timespan'] = df['timespan'].shift(1)

# set index
df.set_index('timespan', inplace=True)
Tyler K
  • 328
  • 2
  • 7