0

I want to use yellowbrick Residual plot to show the residuals for of a linear regression model. From the doc's, I can see that Regression_Plot accepts a single color value for the training datasets.

train_colorcolor, default: ‘b’ Residuals for training data are ploted with this color but also given an opacity of 0.5 to ensure
the test data residuals are more visible. Can be any matplotlib color.

I would like to have the colors of the individual scatter points to match a comparable plot where I am plotting the regression and data points.

import numpy as np
from scipy.stats import linregress
import matplotlib.pyplot as plt

from yellowbrick.regressor import ResidualsPlot
from sklearn.linear_model import LinearRegression


data = np.array([[5.71032104e-01, 2.33781600e+03],
       [6.28682565e-01, 2.25247200e+03],
       [1.23262572e+00, 2.82244800e+03],
       [7.44029502e-01, 2.49936000e+03],
       [4.01478749e-01, 2.04825600e+03],
       [3.46455997e-01, 2.32867200e+03],
       [5.15778747e-01, 2.39268000e+03],
       [4.16115498e-01, 2.20218000e+03],
       [3.24103999e-01, 2.07264000e+03],
       [4.29520513e-01, 1.97815200e+03],
       [7.72794999e-01, 2.46278400e+03]])

x = data[:,1]
y = data[:,0]
names = np.array(['COTTONWOOD CREEK', 'EMIGRANT SUMMIT', 'GRAND TARGHEE',
       'PHILLIPS BENCH', 'PINE CREEK PASS', 'SALT RIVER SUMMIT',
       'SEDGWICK PEAK', 'SLUG CREEK DIVIDE', 'SOMSEN RANCH',
       'WILDHORSE DIVIDE', 'WILLOW CREEK'], dtype=object)
colors = ['#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99','#e31a1c','#fdbf6f','#ff7f00','#cab2d6','#6a3d9a','#ffff99']


slope, intercept, r_value, p_value, std_err = linregress(x ,y)
xHat = np.linspace(x.min()-300,x+300, 100 )
yHat = y * slope +intercept

colors = ['#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99','#e31a1c','#fdbf6f','#ff7f00','#cab2d6','#6a3d9a','#ffff99']

fig,(ax,ax1) = plt.subplots(nrows=2)
for name, x_, y_, color in zip(names, x, y, colors):
    ax.scatter(x_, y_, label = name, c = color)
ax.plot(xHat, xHat*slope + intercept, 'k--', marker=None)
ax.set_xlim(x.min()-200,x.max()+200)
leg = ax.legend(fontsize='x-small', loc='lower right')
ax.text(1934,1.27,'y=' + str(np.round(slope,6))+'x'+ str(np.round(intercept, 3)))
ax.text(1934,1.1, 'R$^2$ =' + str(np.round(r_value**2,4)))


linreg = LinearRegression()
vizul  = ResidualsPlot(linreg, hist=False)
vizul.fit(x.reshape(-1,1) ,y.reshape(-1,1))
vizul.poof(ax=ax1)
plt.tight_layout()

enter image description here

Is it possible to achieve this without having to use base matplotlib for the residual plot?

Thanks.

dubbbdan
  • 2,650
  • 1
  • 25
  • 43
  • Have you tried passing an array of colours into the argument you mention? If that does not work, you may be able to grab the colour array from the generated scatter plot and overwrite it. – Patol75 Oct 11 '19 at 08:32
  • That's along the lines of what I was thinking. I already know the colors to assign from `colors`. I am not sure how to implement tho... – dubbbdan Oct 11 '19 at 13:13
  • Well, let's say you define a scatter plot this way `scat = ax.scatter(listX, listY)`, then you can simply change the colour of all the points using 'scat.set_color(listC)` for which listC contains `len(listX)` items that can be RGBA tuples or color strings (https://matplotlib.org/3.1.1/api/collections_api.html#matplotlib.collections.PathCollection.set_color). Each of them will then define the colour for each point. – Patol75 Oct 11 '19 at 14:53

0 Answers0