1

What does argmax mean in this context? I am following the tutorial in this colab notebook: https://colab.research.google.com/github/chokkan/deeplearningclass/blob/master/mnist.ipynb

for x, y in zip(Xtrain, Ytrain):
        y_pred = np.argmax(np.dot(W, x))

It looks like this is saying that for every record x and its truth value y, in the vectors Xtrain and Ytrain, take the max value of the dot product of the weighted matrix W and the record x. Does this mean it takes the max of the weighted matrix?

It also looks like 1 was appended to the flattened vector:

def image_to_vector(X):
    X = np.reshape(X, (len(X), -1))     # Flatten: (N x 28 x 28) -> (N x 784)
    return np.c_[X, np.ones(len(X))]    # Append 1: (N x 784) -> (N x 785)

Xtrain = image_to_vector(data['train_x'])

Why would that be?

Thank you!

pav
  • 59
  • 1
  • 12

1 Answers1

1

For simplicity, you can treat it as a sort of y = W * x + bias. Additional column of ones is independent on the input, thus working as bias.

Now, our weight matrix W represents a fully connected layer with 785 (28*28+1) inputs and 10 outputs (7850 weights total). The dot product of W and x is a vector of length 10, containing the scores for each possible class (digit in MNIST case). Applying argmax, we get the index with the highest score (our prediction).

dx2-66
  • 2,376
  • 2
  • 4
  • 14
  • Thank you! That is helpful. I also did not understand why we are updating the weights here (the if/else statement): lr=0.0001 train_correct = 0 for x, y in zip(Xtrain, train_y): y_hat = np.argmax(np.dot(W, x)) if y_pred != y: W[y] += x * lr W[y_hat] -= x * lr else: train_correct += 1 Sorry the formatting is so off, its in the same for loop as the above. – pav Jul 11 '22 at 01:32
  • 1
    Nothing special here, in case of a wrong prediction we decrease the weights corresponding to a wrong class (proportional to the actual input) and increase those for the right one, thus making our scores closer to the truth next time when we encounter a similar input. – dx2-66 Jul 12 '22 at 07:51
  • Thank you! So is this happening instead of a gradient descent/backprop? Also I'm not sure I understand the W[y] or W[y_pred] part because it seems to be indexing the weighted matrix at the value of y or y_pred. I guess I don't understand why all the weights of the matrix would not be updated given they have input into how y is predicted? – pav Jul 13 '22 at 13:17
  • 1
    For the intuition purpose, you can treat W as a stack of 10 vectors, one for each class, such that the each dot product with the flattened input yields the respective class score. This is just one possible way of updating weights, and yeah, generally speaking updating weights partially is not guaranteed to converge well. – dx2-66 Jul 13 '22 at 19:33