0

I am trying to replace null values with 0 by applying certain conditions. Here is the code to generate the dataset.

data = {'month': ['2022-01-01', '2022-02-01', '2022-03-01', '2022-01-01', '2022-02-01', '2022-03-01', '2022-04-01', '2022-05-01', '2022-06-01', '2022-07-01', '2022-08-01'], 'Date1': ['2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01', '2022-05-01', '2022-05-01', '2022-05-01', '2022-05-01'], 'Date2': ['2022-02-01', '2022-02-01', '2022-02-01', '2022-04-01', '2022-04-01', '2022-04-01', '2022-04-01', np.nan, np.nan, np.nan, np.nan], 'Name': ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C'], 'num': [1234, 1234, 1234, 456, 456, 456, 456, np.nan, np.nan, np.nan, np.nan], 'sales': ['MN', 'MN', 'MN', 'CA', 'CA', 'CA', 'CA', 'FL', 'FL', 'FL', 'FL'], 'Num1': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 44.0, 44.0, 44.0, 44.0], 'Num2': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 29.0, 29.0, 29.0, 29.0]}

df = pd.DataFrame(data)
df

         month       Date1       Date2 Name   num sales  Num1  Num2
0   2022-01-01  2022-01-01  2022-02-01    A  1234    MN   NaN   NaN
1   2022-02-01  2022-01-01  2022-02-01    A  1234    MN   NaN   NaN
2   2022-03-01  2022-01-01  2022-02-01    A  1234    MN   NaN   NaN
3   2022-01-01  2022-01-01  2022-04-01    B   456    CA   NaN   NaN
4   2022-02-01  2022-01-01  2022-04-01    B   456    CA   NaN   NaN
5   2022-03-01  2022-01-01  2022-04-01    B   456    CA   NaN   NaN
6   2022-04-01  2022-01-01  2022-04-01    B   456    CA   NaN   NaN
7   2022-05-01  2022-05-01         NaN    C   NaN    FL  44.0  29.0
8   2022-06-01  2022-05-01         NaN    C   NaN    FL  44.0  29.0
9   2022-07-01  2022-05-01         NaN    C   NaN    FL  44.0  29.0
10  2022-08-01  2022-05-01         NaN    C   NaN    FL  44.0  29.0

So, I want to replace the null values in the num column for certain rows if month-date1 <=2, and I want to replace null values in Num1, and Num2 column for certain rows if month-date2 <=2. If it is greater than 2, then don't replace it with 0.

So, I was wondering how can I implement these steps in python?

This is the expected outcome enter image description here

BeRT2me
  • 12,699
  • 2
  • 13
  • 31
Bad Coder
  • 177
  • 11

1 Answers1

0

you can use where the condition

import numpy as np
import pandas as pd

data = {'month': ['2022-01-01', '2022-02-01', '2022-03-01', '2022-01-01', '2022-02-01', '2022-03-01', '2022-04-01', '2022-05-01', '2022-06-01', '2022-07-01', '2022-08-01'], 'Date1': ['2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01', '2022-05-01', '2022-05-01', '2022-05-01', '2022-05-01'], 'Date2': ['2022-02-01', '2022-02-01', '2022-02-01', '2022-04-01', '2022-04-01', '2022-04-01', '2022-04-01', np.nan, np.nan, np.nan, np.nan], 'Name': ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C'], 'num': [1234, 1234, 1234, 456, 456, 456, 456, np.nan, np.nan, np.nan, np.nan], 'sales': ['MN', 'MN', 'MN', 'CA', 'CA', 'CA', 'CA', 'FL', 'FL', 'FL', 'FL'], 'Num1': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 44.0, 44.0, 44.0, 44.0], 'Num2': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 29.0, 29.0, 29.0, 29.0]}

df = pd.DataFrame(data)
df['Date1'] =  pd.to_datetime(df['Date1'])
df['Date2'] = pd.to_datetime(df['Date2'])
df['Num1'] = np.where(np.logical_and(df['Date2'].dt.month <=2, np.isnan(df['Num1'])),0, df['Num1'])
df['Num2'] = np.where(np.logical_and(df['Date2'].dt.month <=2, np.isnan(df['Num2'])),0, df['Num2'])
print(df)

in the above example, I am considering the month of Date1 and Date2

if you want to consider month columns also then you can follow the below code

import numpy as np
import pandas as pd

data = {'month': ['2022-01-01', '2022-02-01', '2022-03-01', '2022-01-01', '2022-02-01', '2022-03-01', '2022-04-01', '2022-05-01', '2022-06-01', '2022-07-01', '2022-08-01'], 'Date1': ['2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01', '2022-05-01', '2022-05-01', '2022-05-01', '2022-05-01'], 'Date2': ['2022-02-01', '2022-02-01', '2022-02-01', '2022-04-01', '2022-04-01', '2022-04-01', '2022-04-01', np.nan, np.nan, np.nan, np.nan], 'Name': ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C'], 'num': [1234, 1234, 1234, 456, 456, 456, 456, np.nan, np.nan, np.nan, np.nan], 'sales': ['MN', 'MN', 'MN', 'CA', 'CA', 'CA', 'CA', 'FL', 'FL', 'FL', 'FL'], 'Num1': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 44.0, 44.0, 44.0, 44.0], 'Num2': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 29.0, 29.0, 29.0, 29.0]}

df = pd.DataFrame(data)
#print(df)
df['Date1'] =  pd.to_datetime(df['Date1'])
df['Date2'] = pd.to_datetime(df['Date2'])
df['month'] = pd.to_datetime(df['month'])
#print(df)
df['Num1'] = np.where(np.logical_and(df['month'].dt.month <=2,df['Date1'].dt.month <=2, np.isnan(df['Num1'])),0, df['Num1'])
df['Num2'] = np.where(np.logical_and(df['month'].dt.month <=2,df['Date2'].dt.month <=2, np.isnan(df['Num2'])),0, df['Num2'])
print(df)
  • I used the same code and I got an error **RecursionError: maximum recursion depth exceeded while calling a Python object**. Are we missing something? – Bad Coder Jul 23 '22 at 16:36
  • What is your dataframe length (size of the data) – Kalyanakannan padivasu Jul 23 '22 at 17:34
  • I am not sure that. Honestly, I don't know how to check that. I just copied your code and paste in VS code and hit run but it did not work. – Bad Coder Jul 23 '22 at 17:58
  • https://onecompiler.com/python/3yauyufp9 here you can check working code. – Kalyanakannan padivasu Jul 23 '22 at 18:16
  • yeah, it is working over there but could not figure out why it is not working on vs code. I checked the recursion depth on the webpage as well. It is using the same depth as in the vs code. Is there a other way to do it? – Bad Coder Jul 24 '22 at 10:09