I am having imbalance in my data as shown below, Whenever I have tried with ADASYN it shows error, Do we need to provide any parameter entry for the same ? Some time it runs for long time but no response even after 40 minutes of code run.
counts percentage
Enquiry Assigned 91284 75.902382
Test Drive Provided 25274 21.015258
Test Drive Arranged 3434 2.855361
Booked 266 0.221178
Test Ride Provided 7 0.005820
Please suggest how We can go ahead with the python code to solve the problem. From others recommendation I heard like
- Can do sampling between two levels at once and then can do iteration on the same
- Downsamplig the one with 75% may be helpful ?
- or any solutions by using skmultilearn ?
Code:
def makeOverSamplesADASYN(X,y):
#X →Independent Variable in DataFrame\
#y →dependent Variable in Pandas DataFrame format
from imblearn.over_sampling import ADASYN
sm = ADASYN(sampling_strategy='all', random_state=None, n_neighbors=5, n_jobs=1, ratio=None)
X_adassin, y_adassin = sm.fit_resample(X, y)
makeOverSamplesADASYN(X,data_dummyvar['Sales Stage'])
print(X_adassin.shape)
print(y_adassin.shape)'''
o/p=== > This runs very long time and no result after that , please suggest