0

Importing lightgbm on my system is somehow interfering with the performance of sklearn:

import lightgbm
import numpy as np
from sklearn import datasets, linear_model
diabetes = datasets.load_diabetes()
diabetes_X = diabetes.data[:, np.newaxis, 2]
diabetes_X_train = diabetes_X[:-20]
diabetes_y_train = diabetes.target[:-20]

for k in range(3):
    regr = linear_model.LinearRegression()
    regr.fit(diabetes_X_train, diabetes_y_train)
    print(str(regr.predict(diabetes_X_train)[0:1]))

The result is

[ 173.31236882]
[ 208.65797673]
[ 208.68957407]

which is very not what I expected. Commenting out the import lightgbm on the first line produces the desired result:

[ 210.80457868]
[ 210.80457868]
[ 210.80457868]

This is on macOS 10.12.6 with a recently installed Anaconda3 distribution followed by pip install lightgbm. I also uninstalled lightgbm and built it from source, but that did not seem to make a difference. I'm unable to replicate this on ubuntu.

Update: I completely uninstalled anaconda and homebrew and started all over with only homebrew + pip to manage everything. Looks like the error went away. But I'm still curious if this works for anyone on Mac + anaconda, as I prefer using anaconda.

zkurtz
  • 3,230
  • 7
  • 28
  • 64
  • 1
    Can you try to define a random seed (numpy.random.seed)and run again ? I will try to replicate this asap – seralouk Oct 29 '17 at 22:18
  • I am not able to duplicate your results with scikit v 0.19.1 and python 2. With the lightgbm imported or not, I am getting the second set of scores. Can you clarify a bit more. Is this the only code you are using or anything else above it? What versions of scikit and lightgbm are you using? – Vivek Kumar Oct 30 '17 at 05:27

0 Answers0