I am fairly new to graph neural networks and I am training a GNN model using self attention and I have a few questions.
The question is my node count and node_num differs in each batch such that in the first batch I have:
Batch(batch=[1181], edge_attr=[1975, 3], edge_index=[2, 1975], x=[1181, 300])
in the second batch I have: batch=[1134], edge_attr=[1635, 3], edge_index=[2, 1635], x=[1134, 300]
There were 1181 nodes in batch1, whereas 1134 nodes in batch2
When I tried to calculate self attention between nodes, I encountered the following problem
Here's how self attention works
the Q, W, K calculate as follows: enter image description here
the dimension of wq, wk, wv is
self.w_1 = Param(torch.Tensor(self.nodes_num, self.nodes_num))
So the problem I have is this
in batch1 , the the dimension of wq, wk, wv is self.w_q = Param(torch.Tensor(1181, 1181)) in batch2 , the the dimension of wq, wk, wv is self.w_q = Param(torch.Tensor(1134, 1134)) Dimensions vary with the number of nodes, causing w_q to be constantly redefined
Is this equivalent to using only one batch of samples for the model?
If so, how can I solve the problem?