1

I'm trying to vectorize the following one hot feature function

def one_hot(x, k):
    # Output a 1 in index x of a k by 1 column vector
    out = np.zeros((k,1))
    out[x - 1] = 1
    return out

Here's what I'm trying to test the np.vector on

data = np.array([[2], [3], [4], [5]])

Here's where I call np.vectorize


vec_one_hot = np.vectorize(one_hot)
new_data = np.zeros((7,5))

new_data = vec_one_hot(data, 7)

However I keep getting the following error:

ValueError: setting an array element with a sequence.

I don't know what I'm doing wrong, please help!

BigBear
  • 188
  • 10
  • 1
    I've verified that this error occurs when calling this line: `new_data = vec_one_hot(data, 7)`, however, in the future please include the full Traceback. – Kraigolas Jul 26 '22 at 23:25
  • `new_data = np.zeros((7,5))` is useless, since you try to create a new `new_data` array with `vectorize` You don't "initialize" a variable like that in Python. – hpaulj Jul 26 '22 at 23:37
  • 3
    Your function is returning an array. `vectorize` expects it to return a scalar - one value for each scalar value in the input. Read, and reread the `np.vectorize` docs. It isn't as easy to use correctly as you might guess. Also read its performance disclaimer. It might be easier to skip `vectorize` entirely. Don't keep banging your head against something that doesn't work and you don't understand. – hpaulj Jul 26 '22 at 23:40
  • 2
    Using vectorize signature argument as in [Using Numpy Vectorize on Functions that Return Vectors](https://stackoverflow.com/questions/3379301/using-numpy-vectorize-on-functions-that-return-vectors/46860269#46860269), with `vec_one_hot = np.vectorize(one_hot, signature = '(i),()->(j,1)')` and `data = np.array([2, 3, 4, 5])`, we get for new_data the value: `array([[0.],[1.],[1.],[1.],[1.],[0.],[0.]])` . We can then make a column vector with `new_data.ravel()`. – DarrylG Jul 27 '22 at 00:04

1 Answers1

1

If you just iterate on the 'rows' of `data:

In [321]: [one_hot(i, 7) for i in data]
Out[321]: 
[array([[0.],
        [1.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.]]),
 array([[0.],
        [0.],
        [1.],
        [0.],
        [0.],
        [0.],
        [0.]]),
 array([[0.],
        [0.],
        [0.],
        [1.],
        [0.],
        [0.],
        [0.]]),
 array([[0.],
        [0.],
        [0.],
        [0.],
        [1.],
        [0.],
        [0.]])]

Since you tried to initialize new_data to (7,5), I suspect you want something more like:

In [322]: np.hstack(_)
Out[322]: 
array([[0., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

If you specify an otypes you'd get:

In [326]: f = np.vectorize(one_hot,otypes=[object])
In [327]: f(data,7)
Out[327]: 
array([[array([[0.],
               [1.],
               [0.],
               [0.],
               [0.],
               [0.],
               [0.]])],
       [array([[0.],
               [0.],
               [1.],
               [0.],
               [0.],
               [0.],
               [0.]])],
       [array([[0.],
               [0.],
               [0.],
               [1.],
               [0.],
               [0.],
               [0.]])],
       [array([[0.],
               [0.],
               [0.],
               [0.],
               [1.],
               [0.],
               [0.]])]], dtype=object)

That's a (4,1) array, corresonding to the (4,1) shape of your data. It could be turned into a (7,4) array, np.hstack(_[:,0]).

vectorize does not promise speed; with signature as suggested in a comment performance is even worse. As long as your data is (n,1), I don't see the point to using vectorize.

But, why not populate a new_data array with one step?

In [337]: new_data = np.zeros((7,5),int)
In [338]: new_data[data[:,0]-1,np.arange(4)] =1
In [339]: new_data
Out[339]: 
array([[0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0],
       [0, 1, 0, 0, 0],
       [0, 0, 1, 0, 0],
       [0, 0, 0, 1, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])
hpaulj
  • 221,503
  • 14
  • 230
  • 353