Linear regression with Y values containing NAN using scipy

Question

I have two one dimension arrays and I would like to do some linear regression. I used:

slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)

but the slope and intercept are always NAN, NAN. I read a little bit and I found out that if either x or y has some NAN, that is the results expected. I tried this solution but it doesnt work because, in my case, only the y contains some NANs; not x. So using that solution, I have the error: ValueError: all the input array dimensions except for the concatenation axis must match exactly.

How can i fix this issue?

Can't you simply exclude nan values? I don't see what information they contribute to your model. — cel, Nov 05 '15 at 16:38

score 5 · Accepted Answer · answered Nov 05 '15 at 16:54

5

mask the values in both x and y for which there is a NaN in y:

xm = np.ma.masked_array(x,mask=np.isnan(y)).compressed()
ym = np.ma.masked_array(y,mask=np.isnan(y)).compressed()

slope, intercept, r_value, p_value, std_err = stats.linregress(xm, ym)

answered Nov 05 '15 at 16:54

tmdavison

64,360
12
187
165

(+1) are you interested in submitting this as an improvement to scipy documentation @tom? https://github.com/scipy/scipy/issues/629 – ev-br Nov 05 '15 at 22:01
sure, I'm happy to, but having never done so before, could you point me in the right direction please? :) – tmdavison Nov 06 '15 at 09:34
See, e.g., http://docs.scipy.org/doc/scipy/reference/hacking.html for a start. Let's move further discussion in the github issue I linked above. – ev-br Nov 06 '15 at 10:54

Linear regression with Y values containing NAN using scipy

1 Answers1