0

I have two one dimension arrays and I would like to do some linear regression. I used:

slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)

but the slope and intercept are always NAN, NAN. I read a little bit and I found out that if either x or y has some NAN, that is the results expected. I tried this solution but it doesnt work because, in my case, only the y contains some NANs; not x. So using that solution, I have the error: ValueError: all the input array dimensions except for the concatenation axis must match exactly.

How can i fix this issue?

Community
  • 1
  • 1
user3841581
  • 2,637
  • 11
  • 47
  • 72

1 Answers1

5

mask the values in both x and y for which there is a NaN in y:

xm = np.ma.masked_array(x,mask=np.isnan(y)).compressed()
ym = np.ma.masked_array(y,mask=np.isnan(y)).compressed()

slope, intercept, r_value, p_value, std_err = stats.linregress(xm, ym)
tmdavison
  • 64,360
  • 12
  • 187
  • 165
  • (+1) are you interested in submitting this as an improvement to scipy documentation @tom? https://github.com/scipy/scipy/issues/629 – ev-br Nov 05 '15 at 22:01
  • sure, I'm happy to, but having never done so before, could you point me in the right direction please? :) – tmdavison Nov 06 '15 at 09:34
  • See, e.g., http://docs.scipy.org/doc/scipy/reference/hacking.html for a start. Let's move further discussion in the github issue I linked above. – ev-br Nov 06 '15 at 10:54