So I was implementing the cs231n assignment1 the svm_loss_vectorized function
.
This assignment asks for implementing gradient.
scores = X.dot(W)
correct_class_scores = scores[np.arange(len(y)),y][:,np.newaxis]
margins = np.maximum(0, scores - correct_class_scores + 1) # calculating svm loss
indices = np.where(margins > 0) # pull out the indices where margins are greater than zero
for i in range(len(indices[1])):
dW[:,indices[1][i]] += X.T[:,indices[0][i]]
dW[:,y[indices[0][i]]] -= X.T[:,indices[0][i]]
# dW[:,indices[1]] += X.T[:,indices[0]]
# dW[:,y[indices[0]] -= X.T[:,indices[0]]
# I wrote like this, but this gives me a totally different output where dW shape changed
dW /= len(y)
dW += 2 * reg * W
This code worked just fine, but this is obviously not a vectorized code, which the assignment asks for. I want to vectorize that for
loop part so that it works the same but runs in faster time.
The part I wrote in comment is the first thing I tried, but it didn't give me the correct output.
What I want to do is pull out the index from indices[1]
, which represents the column index and update the whole corresponding column by adding X.T[:,indices[0]]
, the matching column. I guess this doesn't work because indices[1]
is a 1 dimension vector and it just pulls out the corresponding column and make a brand new array.
Then, how do I vectorize that for
loop to work the same as I intended?