cs231n numpy broadcasting

Question

So I was implementing the cs231n assignment1 the svm_loss_vectorized function. This assignment asks for implementing gradient.

scores = X.dot(W)
correct_class_scores = scores[np.arange(len(y)),y][:,np.newaxis]
margins = np.maximum(0, scores - correct_class_scores + 1) # calculating svm loss
indices = np.where(margins > 0) # pull out the indices where margins are greater than zero
for i in range(len(indices[1])):
  dW[:,indices[1][i]] += X.T[:,indices[0][i]]
  dW[:,y[indices[0][i]]] -= X.T[:,indices[0][i]]

# dW[:,indices[1]] += X.T[:,indices[0]]
# dW[:,y[indices[0]] -= X.T[:,indices[0]]
# I wrote like this, but this gives me a totally different output where dW shape changed
dW /= len(y)
dW += 2 * reg * W

This code worked just fine, but this is obviously not a vectorized code, which the assignment asks for. I want to vectorize that for loop part so that it works the same but runs in faster time.

The part I wrote in comment is the first thing I tried, but it didn't give me the correct output. What I want to do is pull out the index from indices[1], which represents the column index and update the whole corresponding column by adding X.T[:,indices[0]], the matching column. I guess this doesn't work because indices[1] is a 1 dimension vector and it just pulls out the corresponding column and make a brand new array.

Then, how do I vectorize that for loop to work the same as I intended?

perhaps learn about numpy's indexing abilities. this looks like it calls for a *mask index*. — Christoph Rackwitz, Aug 29 '23 at 09:10

cs231n numpy broadcasting

0 Answers0