Mapping timeseries sequence input shape to desired output shape using EinsumDense

Question

Can anyone help me understand how to handle compressing/expanding the dimension of a tensor using EinsumDense?

I have a timeseries (not NLP) input tensor of the shape (batch, horizon, features) wherein the intended output is (1, H, F); H is an arbitrary horizon and F is an arbitrary feature size. I'm actually using EinsumDense as my Feed Forward Network in a transformer encoder module and as a final dense layer in the transformer's output. The FFN should map (1, horizon, features) to (1, H, features) and the final dense layer should map (1, H, features) to (1, H, F).

My current equation is shf,h->shf for the FFN, and shf,hfyz->syz for the dense layer, however I'm getting a less than optimal result as compared to my original setup where there was no change in the horizon length and my equations were shf,h->shf and shf,hz->shz respectively.

score 0 · Accepted Answer · edited Dec 21 '22 at 00:19

My two cents,

First, an intuitive understanding of the transformer encoder: Given (batch, horizon, features), the attention mechanism tries to find a weighted linear combination of the projected features. The resulting weights are learned via attention scores obtained by operating between features, over each horizon. The FFN layer that comes next should be a linear combination of values within features.

Coming to EinsumDense by way of example we have two tensors:

a: Data (your input tensor to EinsumDense)
b: Weights (EinsumDense's internal weights tensor)

# create random data in a 3D tensor
a = tf.random.uniform(minval=1, maxval=3, shape=(1,2,3), dtype=tf.int32)
# [[[1, 2, 2],
#   [2, 2, 1]]]

shf,h->shf: This just scales the individual features.

b = tf.random.uniform(minval=2, maxval=4, shape=(2,), dtype=tf.int32) 
# [3, 2]
tf.einsum('shf,h->shf', a, b)
# [[[3, 6, 6],   #1st feature is scaled with 3
#   [4, 4, 2]]]] #2nd feature is scaled with 2

shf,hz->shz: This does a linear combination within features

b = tf.random.uniform(minval=2, maxval=4, shape=(2,6), dtype=tf.int32)
# [[3, 3, 3, 3, 3, 3],
#  [2, 2, 2, 3, 2, 3]]
tf.einsum('shf,hz->shz', a, b)
# [[[15, 15, 15, 15, 15, 15],
#   [10, 10, 10, 15, 10, 15]]]
# every value is a linear combination of the first feature [1, 2, 2] with b. The first value is sum([1,2,2]*3)

The above two resembles the transformer encoder architecture, with a feature scaling layer. And the output structure is preserved (batch, H, F)

shf,hfyz->syz: This does both between features and within features combination.

b = tf.random.uniform(minval=2, maxval=4, shape=(2,3,4,5), dtype=tf.int32)
tf.einsum('shf,hfyz->syz', a,b)
# each element output `(i,j)` is a dot product of a and b[:,:,i,j] 
# first element is tf.reduce_sum(a*b[:,:,0,0])

Here the output (s,y,z), y doesnt correspond to horizon and z doesn't correspond to features, but a combination of values between them.

Mapping timeseries sequence input shape to desired output shape using EinsumDense

1 Answers1