I am builduing a network, that splits strings into words, words into characters, embeds each character and then computes a vector represenation of this string by aggregating characters into words and words into string. Aggregation is performed with bidirectional gru layer with attention.
To test this thing, let's say I am interested in 5 words and 5 characters in this string. In this case my transformation is:
["Some string"] -> ["Some","strin","","",""] ->
["Some_","string","_____","_____","_____"] where _ is the padding symbol ) ->
[[1,2,3,4,0],[1,5,6,7,8],[0,0,0,0,0],[0,0,0,0,0],[0,0,0,0,0]] (shape 5x5)
Next I have an embedding layer that turns every character into an embedding vector of length, let's say 6. So my feature becomes a 5x5x6 matrix. Then I pass this output to bidirectional gru layer and perform some other manipulations that are not important in this case, I believe.
The problem is that when I run it with an iterator, like
for string in strings:
output = model(string)
it seems to be working just fine (strings is a tf Dataset created from slices of 5x5), so it is a bunch of 5 by 5 matrices.
However when I pass over to training, or working at the dataset level with functions like predict, the model fails:
model.predict(strings.batch(1))
ValueError: Input 0 of layer bidirectional is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (None, 5, 5, 6)
As far as I understand from documentation, the bidirectional layer takes 3d tensor as an input: [batch, timesteps, feature], so in this case my input shape should look like: [batch_size,timesteps,(5,5,6)]
So the question is which transformation should I apply to the input data to get this kind of shape?