I'm currently looking into (manual & high dimensional) feature extraction on very large datasets. I am encoding n2 edges in a graph in its simplest form i -> j
.
I'm taking advantage of that features are independent of the i -> j
relationships, and can simply be encoded with something ala encode(i, target=False)
, encode(j, target=True)
. This way I can encode a single graph in linear time (n
time as opposed to n²
).
This data is encoded into a tensor of the shape:
# E :: (n, 2, d)
with d
being a feature dimension. Indexing into an edge is therefor achieved by:
# edge_ij = np.concat([E[source_node, 0, :), E[target_node, 1, :]], axis=-1)
My challenge is now that I'd like to interface into this ndarray as if it was of shape E' :: (n,n,d*2)
ultimately so that I can utilize it to index into a weight vector W
and compute a score, ala:
graph_features = W[E']
graph_scores = graph_features.sum(axis=-1)
There are more computations, which I'd like to do with the resulting graph scores, but this is solved if this is solved.
All my approaches have resulted in a lot of unnecessary array allocations, which I need to avoid to make my experiments feasible.
Is it perhaps possible to create some sort of memoryview? (cython is within reach)
Any ideas?