I am struggling getting my code works, which was writtin in pandas and now i am refactoring it using vaex as howerver loc() isn't exist in vaex. Could anyone please help me in this!
Idea: Aim to replace the missing values in the start_time column by subtracting the end_time – conversation_time(integer and need convert to seconds)
start_time ,conversation_time, end_time;
2023-06-01 19:14:42, 112,2023-06-01 19:16:34;
2023-06-01 19:16:33, 0,2023-06-01 19:16:33;
2023-06-01 19:11:44, 290,2023-06-01 19:16:34;
, 0,2023-06-01 19:16:32;
2023-06-01 19:16:33, 0,2023-06-01 19:16:33;
2023-06-01 19:16:07, 26,2023-06-01 19:16:33;
, 116,2023-06-01 19:16:33;
2023-06-01 19:16:33, 0,2023-06-01 19:16:33;
2023-06-01 19:16:32, 0,2023-06-01 19:16:32;
, 217,2023-06-01 19:00:01
Old code using pandas works fine
# Convert conversation_time to numeric
DF['conversation_time'] = pd.to_numeric(DF['conversation_time'])
# Convert end_time to datetime
DF['end_time'] = pd.to_datetime(DF['end_time'], format='%Y-%m-%d %H:%M:%S')
# Filter to get end_time and conversation_time for rows where start_time is empty
end_conv = DF.loc[DF.start_time == ' ', ['end_time', 'conversation_time']]
# For empty start_times, calculate start_time by subtracting conversation_time from end_time
DF.loc[DF.start_time == ' ', 'start_time'] = [str(data[0] - pd.Timedelta(seconds=data[1])) for data in end_conv.values]
Using vaex
DF = vx.read_csv('data.csv', sep=',', header=None)
# Function to convert to datetime
def convert_to_datetime(date_string):
return np.datetime64(datetime.strptime(str(date_string), '%Y-%m-%d %H:%M:%S'))
# Convert end_time to datetime
DF['end_time'] = DF['end_time'].astype(str).apply(convert_to_datetime)
# Filter to get end_time and conversation_time for rows where start_time is empty
end_conv = DF.filter(DF['start_time'] == ' ')['end_time', 'conversation_time']
# For empty start_times, calculate start_time by subtracting conversation_time from end_time
DF['start_time'] = DF['start_time'].apply(lambda x: [row['end_time'] - np.timedelta64(1, 's') * row['conversation_time'] for index,row in end_conv.iterrows()] if x == ' ' else x
i have tried many ways, finally gettig the previous provided line of codes