-1

I already referred these posts here and here and tried to plot the line of best fit for my linear regression problem.

So, my data shape looks like below

enter image description here

My code to plot the best fit line looks like below

plt.scatter(X_test.values, Y_test.values, color="black") # throws error in this line
plt.plot(Y_test, y_pred, color="blue", linewidth=3)

plt.xticks(())
plt.yticks(())

plt.show()

ValueError: x and y must be the same size

update - output

enter image description here

The Great
  • 7,215
  • 7
  • 40
  • 128

2 Answers2

2

As of now, you are trying to plot 63 variables (from X_test) which is not possible. The best solution is to pick one variable from your dataset and look at it. This is best if you have one particular variable you want to evaluate.

But, it seems like you want to understand your model's performance. Sklearn has a nice page on different metrics you could use to evaluate your regression model.

StonedTensor
  • 610
  • 5
  • 19
  • my linear plot looks like above. Does it say anything? I don't know why there are multiple line of fits? Is it expected? – The Great Nov 04 '22 at 10:27
  • I think it's trying to connect all the points from one to the next. My recommendation is to scatter plot one `x_pred` variable with your `y_pred` then take your `x_pred`, plug it into your model, and plot the result of that line – StonedTensor Nov 04 '22 at 11:16
1

Did you mean to scatter y_pred and Y_test, instead of X_test and Y_test ?

X_test.values would be an array of arrays containing n_rows * n_cols values

antoine1111
  • 33
  • 1
  • 6
  • 1
    X_test.values would contain 1711 * 63 values as an array of arrays – antoine1111 Nov 04 '22 at 10:03
  • Here they say, it should be X_test and Y_test - https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html#sphx-glr-auto-examples-linear-model-plot-ols-py – The Great Nov 04 '22 at 10:03
  • The page also says : The example below uses only the first feature of the diabetes dataset – antoine1111 Nov 04 '22 at 10:04
  • 1
    You could make your code work by also choosing a single feature of your dataset, one that you think is of particular relevance to plot against your target variable – antoine1111 Nov 04 '22 at 10:06
  • For model performance, choose a metric in sklearn.metrics and use it on y_pred and y_test – antoine1111 Nov 04 '22 at 10:07
  • 1
    Oh okay. So, line of best fit is usually plotted (and can only be plotted) for one feature at a time? – The Great Nov 04 '22 at 10:07
  • 1
    Plotting is not a good way to evaluate model performance with more than 2 features. Try R^2 as a metric to understand how well the model fits the data – StonedTensor Nov 04 '22 at 10:08