I want to create a list/array with missing values as element then, I wanted to fill that with some conditions

Question

import pandas as pd

create a dataframe from the given data

#I wanted to create missing values for column-0 data_frame = pd.DataFrame({'column-0': ['Mumbai', '', 'Mumbai', 'Mumbai', 'Mumbai', 'Moscow', ''],'column-1': ['2022-10-26 07:07:41', '2022-10-26 07:08:02', '2022-10-26 07:09:10', '2022-10-26 17:31:10', '2022-10-26 17:35:18', '2022-10-27 17:38:04', '2022-10-27 17:41:15'],'column-2': ['id_111'] * 7})

convert the 'column-1' to datetime type

data_frame['column-1'] = pd.to_datetime(data_frame['column-1'])

extract day from datetime

data_frame['day'] = data_frame['column-1'].dt.date

create a new column 'frequency' and count the frequency of each location on each day

data_frame['frequency'] = data_frame.groupby(['column-0', 'day'], group_keys=False)['column-0'].transform('count')

fill the missing values in 'column-0' by the location which occurs maximum number of times on each day

data_frame['column-0'] = data_frame.groupby(['day'], group_keys=False)['column-0'].apply(lambda x: x.fillna(x.value_counts().index[0]))

drop the 'day' and 'frequency' columns

data_frame = data_frame.drop(['day', 'frequency'], axis=1)

print the final dataframe

print(data_frame)

#The output I get: column-0 column-1 column-2 0 Mumbai 2022-10-26 07:07:41 customer_id_127 1 2022-10-26 07:08:02 customer_id_127 2 Mumbai 2022-10-26 07:09:10 customer_id_127 3 Mumbai 2022-10-26 17:31:10 customer_id_127 4 Mumbai 2022-10-26 17:35:18 customer_id_127 5 Moscow 2022-10-27 17:38:04 customer_id_127 6 2022-10-27 17:41:15 customer_id_127

#I want this output column-0 column-1 column-2 0 Mumbai 2022-10-26 07:07:41 customer_id_127 1 Mumbai 2022-10-26 07:08:02 customer_id_127 2 Mumbai 2022-10-26 07:09:10 customer_id_127 3 Mumbai 2022-10-26 17:31:10 customer_id_127 4 Mumbai 2022-10-26 17:35:18 customer_id_127 5 Moscow 2022-10-27 17:38:04 customer_id_127 6 Moscow 2022-10-27 17:41:15 customer_id_127

Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. — Community, Feb 10 '23 at 03:46

score 0 · Answer 1 · answered Feb 10 '23 at 02:27

import pandas as pd

data_frame = pd.DataFrame({'column-0': ['Mumbai', '', 'Mumbai', 'Mumbai', 'Mumbai', 'Moscow', ''],
                           'column-1': ['2022-10-26 07:07:41', '2022-10-26 07:08:02', '2022-10-26 07:09:10',
                                        '2022-10-26 17:31:10', '2022-10-26 17:35:18', '2022-10-27 17:38:04',
                                        '2022-10-27 17:41:15'],
                           'column-2': ['id_111'] * 7})

data_frame['column-1'] = pd.to_datetime(data_frame['column-1'])

data_frame['day'] = data_frame['column-1'].dt.date

mode_location = data_frame.groupby('day')['column-0'].apply(lambda x: x.value_counts().index[0])

data_frame['column-0'].fillna(data_frame['day'].map(mode_location), inplace=True)

data_frame.drop(['day'], axis=1, inplace=True)

print(data_frame)

output

  column-0            column-1 column-2
0   Mumbai 2022-10-26 07:07:41   id_111
1          2022-10-26 07:08:02   id_111
2   Mumbai 2022-10-26 07:09:10   id_111
3   Mumbai 2022-10-26 17:31:10   id_111
4   Mumbai 2022-10-26 17:35:18   id_111
5   Moscow 2022-10-27 17:38:04   id_111
6          2022-10-27 17:41:15   id_111