-2

I have rows in a dataset which look like this:

State StartDate EndDate Delta (hours) Day
Correct 2022-07-12 19:00:00 2022-07-13 11:00:00 16 2022-07-12

I would like to separate the Delta for each day through Start and End dates, like this:

State StartDate EndDate Delta (hours) Day
Correct 2022-07-12 19:00:00 2022-07-12 23:59:59 5 2022-07-12
Correct 2022-07-13 00:00:00 2022-07-13 11:00:00 11 2022-07-13

What is the best way to do this in a pandas dataframe?

Jason Aller
  • 3,541
  • 28
  • 38
  • 38
  • 1
    what have you tried so far? – CarlosSR Jul 20 '22 at 18:09
  • I don't get it. Why is 16 split into 5 and 11? Is Delta alway originally 16 and always split into 5 and 11? – eduffy Jul 20 '22 at 19:37
  • 1
    @eduffy He wants to split the row "start=a end=b" into two rows "start=a end=midnight" and "start=midnight end=b". The new deltas are just (midnight - a) and (b - midnight), which would sum to (b - a) as expected. – chepner Jul 20 '22 at 20:51
  • Stack Overflow isn't for "best way to do X" questions; it's for "how do I fix my attempt to do X". – chepner Jul 20 '22 at 20:52
  • @CarlosSR I thought about creating a new column with `df['flag'] = np.where(df['StartDate'].dt.day < df['EndDate'].dt.day, 1, 0)` Then, creating a new dataset: `df2 = df[df.flag == 1]` And somehow iterate over the rows, but I'm stuck on this part – Fer.vieira122 Jul 21 '22 at 11:24

1 Answers1

-2

I thought this would be an interesting exercise, so I attempted it. Let me know if you have any questions.

import pandas as pd
import datetime

# Initial Data
list = ['Correct','2022-07-12 19:00:00','2022-07-15 11:00:00']
# Create Dataframe
df = pd.DataFrame(data = [list],columns=['State','StartDate','EndDate'])
# Convert Dates to Datetime
df['StartDate'] = pd.to_datetime(df['StartDate'])
df['EndDate'] = pd.to_datetime(df['EndDate'])
# Create Dataframe for New Results
df2 = pd.DataFrame(data=None,columns=['State','StartDate','EndDate'])
# Iterate through inital dataframe rows

for n,m in df.iterrows():
    # For each row, determine the number of days between the start and end date
    num_days = (df['EndDate'].dt.day - df['StartDate'].dt.day)
    
#  Iterating number of days between the start and end date
    for a in range(0,num_days[0]+1):

        # For the first day
        if a ==0:
            # Create a list
            list = []
            # Populate the list with the values of interest
            list.append('Correct') # State
            list.append(df['StartDate'].iloc[n]) # Startdate
            list.append((df['StartDate'].iloc[n] + datetime.timedelta(days=1 )).normalize()) # EndDate
            # Append values to new dataframe
            df2 = pd.concat([df2,pd.DataFrame([list],columns=['State','StartDate','EndDate'])])
            # Get the next date
            nextdate = (df['StartDate'].iloc[n] + datetime.timedelta(days=1 )).normalize()
      
  # For the last day
        elif a == num_days[0]:
            list = []
            # Populate the list with the values of interest
            list.append('Correct') # State
            list.append(nextdate) # Startdate
            list.append(df['EndDate'][n]) # EndDate
            # Append values to new dataframe
            df2 = pd.concat([df2,pd.DataFrame([list],columns=['State','StartDate','EndDate'])])
      
  # For all the dates inbetween
        else:
            list = []
            # Populate the list with the values of interest
            list.append('Correct') # State
            list.append(nextdate) # Startdate
            list.append((nextdate + datetime.timedelta(days=1 )).normalize()) # EndDate
            # Append values to new dataframe
            df2 = pd.concat([df2, pd.DataFrame([list], columns=['State', 'StartDate', 'EndDate'])])
            # Get the next date
            nextdate = (nextdate + datetime.timedelta(days=1 )).normalize()

# Convert Dates in New Dataframe to Datetime
df2['StartDate'] = pd.to_datetime(df2['StartDate'])
df2['EndDate'] = pd.to_datetime(df2['EndDate'])
# Compute the Delta Column
df2['Delta(Hours)'] = (df2['EndDate']-df2['StartDate']).dt.total_seconds()/60/60
# Extract the Date Column
df2['Day'] = (df2['StartDate']).dt.date
gphull
  • 78
  • 5