4

What I'm trying to do is build a regressor based on a value in a feature. That is to say, I have some columns where one of them is more important (let's suppose it is gender) (of course it is different from the target value Y).

I want to say:
- If the gender is Male then use the randomForest regressor
- Else use another regressor

Do you have any idea about if this is possible using sklearn or any other library in python?

serv-inc
  • 35,772
  • 9
  • 166
  • 188
dark
  • 256
  • 2
  • 12

2 Answers2

2

You might be able to implement your own regressor. Let us assume that gender is the first feature. Then you could do something like

class MyRegressor():
    '''uses different regressors internally'''
    def __init__(self):
        self.randomForest = initializeRandomForest()
        self.kNN = initializekNN()

    def fit(self, X, y):
        '''calls the appropriate regressors'''
        X1 = X[X[:,0] == 1]
        y1 = y[X[:,0] == 1]
        X2 = X[X[:,0] != 1]
        y2 = y[X[:,0] != 1]
        self.randomForest.fit(X1, y1)
        self.kNN.fit(X2, y2)

    def predict(self, X):
        '''predicts values using regressors internally'''
        results = np.zeros(X.shape[0])
        results[X[:,0]==1] = self.randomForest.predict(X[X[:,0] == 1])
        results[X[:,0]!=1] = self.kNN.predict(X[X[:,0] != 1])

        return results
serv-inc
  • 35,772
  • 9
  • 166
  • 188
  • Thanks a lot, this is what I was looking for. However I edited your solution in order to not to iterate item by item, which is really slow. thanks again – dark Feb 01 '16 at 20:30
0

I personally am new to Python but I would use the data type of a list. I would then proceed to making a membership check and reference the list you just wrote. Then proceed to say that if member = true then run/use randomForest regressor. If false use/run another regressor.