I am using a big data with million rows and 1000 columns. I already referred this post here. Don't mark it as duplicate.
If sample data required, you can use the below
from numpy import *
m = pd.DataFrame(array([[1,0],
[2,3]]))
I have some continuous variables with 0 values in them.
I would like to compute logarithmic transformation
of all those continuous variables.
However, I encounter divide by zero error
. So, I tried the below suggestion based on above linked post
df['salary'] = np.log(df['salary'], where=0<df['salary'], out=np.nan*df['salary']) #not working `python stopped working` problem`
from numpy import ma
ma.log(df['app_reg_diff']) # error
My questions are as follows
a) How to avoid divide by zero error
when applying for 1000 columns? How to do this for all continuous columns?
b) How to exclude zeros from log transformation and get the log values for rest of the non-zero observations?