A question about representing a couple of RDF-triples using tensor.
Scenario:
A RDF-triple is used to express simple statements about resources, formatting (subject, predicate, object).
Suppose I have two predicates, one is play_for, the other is race_for, each of which contains n triples, as follows:
1-st predicate: play_for; n triples: (Ray Allen, play_for, Boston Celtics), (Kobe Bryant, play_for, Lakers), ... For short, (A_i, play for, T_i) for i =1 to n.
2-rd predicate: race_for; n triples: (Boston Celtics, race_for, NBA championship), (Lakers, race_for, NBA championship), ... For short, (T_i, race for, NBA) for i=1 to n.
Tensor representation is one way to modeling this 2n triples. I'm studying Maximilian Nickel's paper to use tensor factorization to find the latent semantic structure of a dataset. And the first step is to represent the dataset using tensor.
A tensor entry X_ijk = 1 denotes the fact that there exists a relation (i-th entity, k-th predicate, j-th entity). Otherwise, for non-existing and unknown relations, the entry is set to zero. For instance, this 2n triples can be modeled by a tensor as:
One slice: (A_i, play for, T_i)
A1, A2,...,An, T1, T2,...,Tn, NBA
A1 0 0 0 1 0 0 0
A2 0 0 0 0 1 0 0
:
An 0 0 0 0 0 1 0
T1 0 0 0 0 0 0 0
T2 0 0 0 0 0 0 0
:
Tn 0 0 0 0 0 0 0
NBA 0 0 0 0 0 0 0
The other slice: (T_i, race for, NBA)
A1, A2,...,An, T1, T2,...,Tn, NBA
A1 0 0 0 0 0 0 0
A2 0 0 0 0 0 0 0
:
An 0 0 0 0 0 0 0
T1 0 0 0 0 0 0 1
T2 0 0 0 0 0 0 1
:
Tn 0 0 0 0 0 0 1
NBA 0 0 0 0 0 0 0
Assume the RDF-triples is stored in 'test.txt'. My question is how to programming this modeling process using Python.
Here is what I think:
The most difficult thing is how to get the coordinate of the RDF-triple corresponding to the position of non-zeros in the tensor. At first, here is a list containing all entities:
T = ['A1',...,'An','T1',...'Tn','NBA']
For every RDF-triple (Subject_i, Predicate_k, Object_j) in the dataset, there is a coordinate (i,j,k) describe the position of X_ijk = 1 in a tensor. For instance, The coordinate of a existing RDF-triple (A_i, play for, T_i) is (5, 1, 13), which means X(5,13) = 1 in the first slice matrix. However, I don't know how to get this coordinate. Should I use dictionary to store the triple?
I don't quite familiar with Python, and I've tried to get the solution, but I have no idea about how to solve it. Any help would be greatly appreciated.
EDIT: For brevity and readability, I have deleted the description of RDF.