Hi everyone I'm trying to solve the TIMIT task by applying CNN + Dense + CTC
So basically here is my model:
1) Some ConvLayers2D.
2) Transformation of shape
3) Dense
4) CTC
So the transformation is :
After CNNs I get an output of shape (Batch_size,number_of_feature_maps,41, sequence_length) 41 being the Mel filter bank / energy
I turn it to (Batch_size,sequence_length,41*number_of_feature_maps) to get a dim of 3 with:
Notice that the sequence_length is None, since it varies for each mini_batch so we have something like (None,None, X)
And then I basically tried two things, here are the codes:
and
I basically don't get the behaviors of these two methods. The first one with TimeDistributed just works, the Loss and Phoneme Error Rate decrease. The problem is that the second works too ! . What does the Dense layer do on (None,None,X) tensors ?
Thanks !