2

I have this function:

import numpy as np 
def unhot(vec):
    """ takes a one-hot vector and returns the corresponding integer """
    assert np.sum(vec) == 1    # this assertion shouldn't fail, but it did...
    return list(vec).index(1)

that I call on the output of a call to:

numpy.random.multinomial(1, coe)

and I got an assertion error at some point when I ran it. How is this possible? Isn't the output of numpy.random.multinomial guaranteed to be a one-hot vector?

Then I removed the assertion error, and now I have:

ValueError: 1 is not in list

Is there some fine-print I am missing, or is this just broken?

capybaralet
  • 1,757
  • 3
  • 21
  • 31
  • what are you passing as vec? – Sharif Mamun Apr 24 '14 at 00:53
  • 1
    What is `coe`, is it one-dimensional? – ebarr Apr 24 '14 at 01:06
  • @S.M.AlMamun, `vec` is the ouput from `np.random.multinomial`, as stated in the question. – askewchan Apr 24 '14 at 01:20
  • I can't repeat this behavior (you have `coe.sum() == 1` I assume) for any value of `coe`. What version of numpy are you running? – askewchan Apr 24 '14 at 01:27
  • Also, I'd recommend replacing `list(vec).index(1)` with `vec.argmax()`, which is the same in the case that `1` is the max value of the array. For identical behavior but slower performance (but still much faster than converting to list) use `np.where(vec==1)[0][0]` – askewchan Apr 24 '14 at 02:06
  • askewchan - thanks. I am trying to ensure that the sum is leq 1, but I am having problems there as well, see my other question: (http://stackoverflow.com/questions/23257587/how-can-i-avoid-value-errors-when-using-numpy-random-multinomial) I guess I should try with python's sum rather than numpy's. – capybaralet Apr 24 '14 at 16:08
  • ebarr - yes, coe is one dimensional. It is a vector. – capybaralet Apr 24 '14 at 16:08

1 Answers1

1

Well, this is the problem, and I should've realized, because I've encountered it before:

np.random.multinomial(1,A([  0.,   0.,  np.nan,   0.]))

returns

array([0,                    0, -9223372036854775807,0])

I was using an unstable softmax implementation that gave the Nans. Now, I was trying to ensure that the parameters I passed multinomial had a sum <= 1, but I did it like this:

coe = softmax(coeffs)
while np.sum(coe) > 1-1e-9:
    coe /= (1+1e-5)

and with NaNs in there, the while statement will never even get triggered, I think.

capybaralet
  • 1,757
  • 3
  • 21
  • 31
  • 1
    That large negative number is the int representation of `np.nan` which only works as a float. This still seems like a strange way to make `coe.sum() <=1`. How do you want `np.nan` to be interpreted? Probably as `0`, or `1`? If `0`, you can use `np.nansum` to ignore the `nan`: `coe /= np.nansum(coe)`. If `1`, you'll have to just say `coe[np.isnan] = 1` before `coe /= coe.sum()`. – askewchan Apr 25 '14 at 14:56