0

I have a data frame that contains product sales for each day starting from 2018 to 2021 year. Dataframe contains four columns (Date, Place, Product Category and Sales). From the first two columns (Date, Place) I want to use the available data to fill in the gaps. Once the data is added, I would like to delete rows that do not have data in ProductCategory. I would like to do in python pandas.

The sample of my data set looked like this:

My Data frame

I would like the dataframe to look like this:

output dataframe

Denmla
  • 135
  • 1
  • 8

3 Answers3

0

Use fillna with method 'ffill' that propagates last valid observation forward to next valid backfill. Then drop the rows that contain NAs.

df['Date'].fillna(method='ffill',inplace=True)
df['Place'].fillna(method='ffill',inplace=True)
df.dropna(inplace=True)
dkantor
  • 162
  • 6
0

You are going to use the forward-filling method to replace null values with the value of the nearest one above it df['Date', 'Place'] = df['Date', 'Place'].fillna(method='ffill'). Next, to drop rows with missing values df.dropna(subset='ProductCategory', inplace=True). Congrats, now you have your desired df

Documentation: Pandas fillna function, Pandas dropna function

Mark Saleh
  • 41
  • 9
0

compute the frequency of catagories in the column by plotting, from plot you can see bars reperesenting the most repeated values

df['column'].value_counts().plot.bar()

and get the most frequent value using index, index[0] gives most repeated and index[1] gives 2nd most repeated and you can choose as per your requirement.

most_frequent_attribute = df['column'].value_counts().index[0]

then fill missing values by above method

df['column'].fillna(df['column'].most_freqent_attribute,inplace=True)

to fill multiple columns with same method just define this as funtion, like this

def impute_nan(df,column):
    most_frequent_category=df[column].mode()[0]
    df[column].fillna(most_frequent_category,inplace=True)

for feature in ['column1','column2']:
    impute_nan(df,feature)