1

I am trying to train NMF with a training set and apply it to a test set. But I encounter the incompatible shape problem in the second step.

I wrote a python code to do NMF:

def cost(X, W, H):
    """
    Compute the Euclidean distance-based objective function for NMF.
    """
    diff = X - np.dot(W, H)
    cost_value = (diff * diff).sum() / (X.shape[0] * X.shape[1])
    return cost_value

def nmf(X, m):
    """
    Perform NMF on the input matrix X with m dimensions.
    Returns the reduced data matrix, the basis vectors, and the cost at each iteration.
    """
    # Initialize the basis vectors randomly
    W = np.random.rand(X.shape[0], m)
    H = np.random.rand(m, X.shape[1])

    # Update the basis vectors and coefficients iteratively
    cost_values = []
    pseudo_count = 0.0001
    for i in range(100):
        # compute the update on H
        H = H * ((W.T.dot(X))/(W.T.dot(W.dot(H))+pseudo_count))

        # compute the update on W
        W = W * ((X.dot(H.T))/(W.dot(H.dot(H.T))+pseudo_count))
        cost_values.append(cost(X, W, H))

    # Compute the reduced data matrix
    X_nmf = np.dot(W, H)

    return X_nmf, W, cost_values

But I am very confused when I train NMF on a training set to find $k$ non-negative components, how do I use it to reconstruct the images in the test set? I have tried following:

m=10 
X_train_nmf, nmf_basis_vectors, nmf_cost_values = nmf(X_train, m)
H_test = np.linalg.lstsq(nmf_basis_vectors, X_test.T, rcond=None)[0]
reconstructed_test_nmf = np.dot(W, H_test)

But because the shape of X_train is (4000, 784), and shape of X_test is (800, 784), it failed to compute H_test(shape of nmf_basis_vectors is (4000, m)). I really appreciate anyone help me with this. Thanks in advance.

laikaka
  • 29
  • 5

0 Answers0