1

I'm trying to use the scipy fmin function on a random forest regression model of an example dataset. The model works well, but when I try the fmin function with an initial guess np.zeros(8), I get this error:

ValueError: Expected 2D array, got 1D array instead:
array=[0. 0. 0. 0. 0. 0. 0. 0.].
Reshape your data either using array.reshape(-1, 1) if your data has a 
single feature or array.reshape(1, -1) if it contains a single sample.

So I do reshape the array and it returns the exact same error message. Here's the code so far:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn as sk
import scipy as sp


data = pd.read_csv('Concrete_Data.csv')
data.describe(include='all')
Y = data.iloc[:,-1]
X = data.iloc[:,0:-1]

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.2, 
random_state = 0)

from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(random_state = 0)
regressor.fit(X_train,y_train)

def f(x):
    p=regressor.predict(x)
    return p

guess = np.zeros(8)
guess = guess.reshape(-1, 1) 
minimum = sp.optimize.fmin(f,guess)
print('min = ', minimum)

I've tried to give it a row from the training data as an initial guess too and it returns the exact same error message as before. Can this be done? If it's possible it would be very useful for my work. Thanks James

JEngleback
  • 11
  • 1

1 Answers1

1

Sadly this code is not reproducible (external data) and the error/stack-trace is somewhat incomplete.

From the given code, it looks like you just need to modify:

def f(x):
    p=regressor.predict(x)
    return p

->
def f(x):
    p=regressor.predict(x.reshape(1, -1))
    return p

assuming, that your regressor returns a scalar, and you want to look for some unknown input-sample minimizing this scalar.

The reason for this error is that scipy (is probably) flattening x internally (basically done in all optimizers within scipy), meaning, an a-priori reshape is not enough.

sascha
  • 32,238
  • 6
  • 68
  • 110
  • Thank you for your answer! It seems like sklearn only likes 2D data for some reason. I was working with the dataset from the UCL repository here: http://archive.ics.uci.edu/ml/machine-learning-databases/concrete/compressive/ and thought it would be cool to be able to find predicted optimums using models like random forest, which could be really handy in biology. Anyway thanks again for your answer and sorry for not replying for such a long time – JEngleback Aug 15 '18 at 09:34