I have a tensor with dimensions 4149x1000, representing 4149 images, each characterized by 1000 features. Additionally, there are 101 labels, and while there are 4149 images, these labels are not one-to-one mapped to images. Instead, there is repetition. For instance, image 0 has the same label as images 1, 2, 3, 4, and so on. Similarly, image 460 shares its label with images 461, 462, and beyond.
The primary objective of this endeavor is to develop a program that not only performs CP/PARAFAC decomposition within the designated feature space but also effectively manages the latent semantic computation. Furthermore, these latent semantics need to be presented in an organized fashion as ordered <label-weight> pairs, with the weights arranged in descending order.
I've taken the step of amalgamating the image tensor and the label tensor to construct a cohesive 3D tensor:
K = 2
expanded_label_tensor = label_tensor.unsqueeze(1) # shape: (4149, 1)
expanded_label_tensor = expanded_label_tensor.expand(-1, my_tensor.shape[1]) # shape: (4149, 1000)
expanded_label_tensor = expanded_label_tensor.unsqueeze(1) # shape: (4149, 1, 1000)
combined_tensor = torch.cat((my_tensor.unsqueeze(1), expanded_label_tensor), dim=1) # shape: (4149, 2, 1000)
# executing the decomposition
combined_tensor_np = combined_tensor.detach().numpy().astype(np.float32)
weights, factors = tl.decomposition.parafac(combined_tensor_np, rank=K)
What I've done is correct? Because I'm far from certain that the obtained output matches the desired outcome.