1

I'm currently looking into (manual & high dimensional) feature extraction on very large datasets. I am encoding n2 edges in a graph in its simplest form i -> j.

I'm taking advantage of that features are independent of the i -> j relationships, and can simply be encoded with something ala encode(i, target=False), encode(j, target=True). This way I can encode a single graph in linear time (n time as opposed to ).

This data is encoded into a tensor of the shape:

# E :: (n, 2, d)

with d being a feature dimension. Indexing into an edge is therefor achieved by:

# edge_ij = np.concat([E[source_node, 0, :), E[target_node, 1, :]], axis=-1)

My challenge is now that I'd like to interface into this ndarray as if it was of shape E' :: (n,n,d*2) ultimately so that I can utilize it to index into a weight vector W and compute a score, ala:

graph_features = W[E'] graph_scores = graph_features.sum(axis=-1)

There are more computations, which I'd like to do with the resulting graph scores, but this is solved if this is solved.

All my approaches have resulted in a lot of unnecessary array allocations, which I need to avoid to make my experiments feasible.

Is it perhaps possible to create some sort of memoryview? (cython is within reach)

Any ideas?

Daniel Varab
  • 223
  • 1
  • 10

1 Answers1

0

The core of your question seems to be how to view an array with shape (n, 2, d) as one with shape (n, n, d*2). Since changing from (2, d) to (d*2) is trivial with reshape() I will ignore that part and focus on viewing a 1D array as a 2D square one.

You can use stride_tricks.broadcast_to():

line = np.arange(10)
square = np.lib.stride_tricks.broadcast_to(line, (10, 10))

This makes square a view of line (meaning it does not extra memory beyond some constant overhead), with all the values repeated 10 times.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436