0

I am struggling getting my code works, which was writtin in pandas and now i am refactoring it using vaex as howerver loc() isn't exist in vaex. Could anyone please help me in this!

Idea: Aim to replace the missing values in the start_time column by subtracting the end_timeconversation_time(integer and need convert to seconds)

start_time   ,conversation_time,           end_time;
2023-06-01 19:14:42,        112,2023-06-01 19:16:34;
2023-06-01 19:16:33,          0,2023-06-01 19:16:33;
2023-06-01 19:11:44,        290,2023-06-01 19:16:34;
                   ,          0,2023-06-01 19:16:32;
2023-06-01 19:16:33,          0,2023-06-01 19:16:33;
2023-06-01 19:16:07,         26,2023-06-01 19:16:33;
                   ,        116,2023-06-01 19:16:33;
2023-06-01 19:16:33,          0,2023-06-01 19:16:33;
2023-06-01 19:16:32,          0,2023-06-01 19:16:32;
                   ,        217,2023-06-01 19:00:01

Old code using pandas works fine

# Convert conversation_time to numeric 
DF['conversation_time'] = pd.to_numeric(DF['conversation_time'])
# Convert end_time to datetime 
DF['end_time'] = pd.to_datetime(DF['end_time'], format='%Y-%m-%d %H:%M:%S')
# Filter to get end_time and conversation_time for rows where start_time is empty
end_conv = DF.loc[DF.start_time == '                   ', ['end_time', 'conversation_time']]
# For empty start_times, calculate start_time by subtracting conversation_time from end_time 
DF.loc[DF.start_time == '                   ', 'start_time'] = [str(data[0] - pd.Timedelta(seconds=data[1])) for data in end_conv.values]

Using vaex

DF = vx.read_csv('data.csv', sep=',', header=None)

# Function to convert to datetime
def convert_to_datetime(date_string):
    return np.datetime64(datetime.strptime(str(date_string), '%Y-%m-%d %H:%M:%S'))

# Convert end_time to datetime 
DF['end_time'] = DF['end_time'].astype(str).apply(convert_to_datetime)

# Filter to get end_time and conversation_time for rows where start_time is empty
end_conv = DF.filter(DF['start_time'] == '                   ')['end_time', 'conversation_time']

# For empty start_times, calculate start_time by subtracting conversation_time from end_time 
DF['start_time'] = DF['start_time'].apply(lambda x:  [row['end_time'] - np.timedelta64(1, 's') * row['conversation_time'] for index,row in end_conv.iterrows()]  if x == '                   ' else x

i have tried many ways, finally gettig the previous provided line of codes

0 Answers0