I am doing a binary classification problem, I am struggling with removing outliers and also increasing accuracy.
Ratings are one my feature looks like this:
0 0.027465
1 0.027465
2 0.027465
3 0.027465
4 0.027465
...
26043 0.027465
26044 0.027465
26045 0.102234
26046 0.027465
26047 0.027465
mean value of the data:
train.ratings.mean()
0.03871552285960927
std of the data:
train.ratings.std()
0.07585168664836195
I tried the log transformation but accuracy is not increased:
train['ratings']=np.log(train.ratings+1)
my goal is to classify the data true or false:
train.netgain
0 False
1 False
2 False
3 False
4 True
...
26043 True
26044 False
26045 True
26046 False
26047 Fals