-2

I am trying to create a variable-sequence length data for LSTM model.So, i created a list of numpy arrays of variable length. For example, one array is of length => (11,20) and other is of length => (9,20).Here 20 is the number of features and 11/9 denote sequence length

Each array has zeros and a numerical value(class index) at the end.

If audio file belongs to class index = 1 and number of frames are 8 then y will be [0,0,0,0,0,0,0,1]

I wanted to know that when model.fit is called on each variable length audio data, then last Dense layer() will have output shape equal to number of frames the audio has

So, how can I create a variable output shape of Dense layer(8 or 9 or 11)

I do not want to pad sequences instead give a 'None' in LSTM input layer shape.

But, on converting into categorical, I get error: setting array element with sequence.

Here , 11 is number of classes

y_train=to_categorical(y_train,11)
y_test=to_categorical(y_test,11)

Tensorflow version : 1.3.0

Keras version : 2.0.9

Roma Jain
  • 333
  • 4
  • 13
  • What does the shape (11,20) denote? Is this the shape of `y_train`? What is 20 here? – Vivek Kumar Apr 25 '18 at 09:52
  • @VivekKumar 20 is the number of features and 11/9 denote sequence length – Roma Jain Apr 25 '18 at 09:53
  • Still not clear. You want to encode the targets or features? Please only show how the targets are saved in y_train? – Vivek Kumar Apr 25 '18 at 09:56
  • @VivekKumar so (11,20) is shape of mfcc vector of one audio file – Roma Jain Apr 25 '18 at 09:56
  • So that are features. Whats the output of that file? The labels? The data in y_train and y_test which is throwing the error? – Vivek Kumar Apr 25 '18 at 09:57
  • @VivekKumar, y_train has arrays where each audio file's frames are given a label =>So for X where class index is 9 and frames are 11,so first ten frames are 0 and last index has value 9. [0,0,0,0,0,0,0,0,0,0,9] – Roma Jain Apr 25 '18 at 09:59
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/169746/discussion-between-vivek-kumar-and-roma-jain). – Vivek Kumar Apr 25 '18 at 10:01

1 Answers1

1

As you said in the question:

Each array has zeros and a numerical value(class index) at the end.

If audio file belongs to class index = 1 and number of frames are 8 then y will be [0,0,0,0,0,0,0,1]

First Step Input:

[update] One approach to deal with variable input size is to use dynamic_rnn. Read more here

Another way we deal with this problem, as we do in text classification is using padding.

So say we create embeddings of size 10, so each word will have a 10 size vector input.

A sentence input of say 5 words, will actually be a input of 5 * 10.

Now, since TF/Keras accepts fixed size inputs. We pad the inputs.

So we take a fixed input length of number of words (say 20 for this case).

Words that occur after 20 are dropped.

If a Sentence has 5 words it will have 15 paddings after that.

Now the input will be passed through the embedding layer and final input will be (None, 20, 10) representing (Batch_size, Sequence_length, embedding_dimension).

Your problem can also be converted into similar input.

For example, one array is of length => (11,20) and other is of length => (9,20).Here 20 is the number of features and 11/9 denote sequence length

Here assume sequence length to be 15 or so.. and assume 20 to embedding dimension.

You will have to pad your input of size 9/11 to be consistent at say 15 (you can experiment with this number).

20 is your embedding dimension, features.

So Final input shape will be (None, 15, 20).

Next Step Output:

You can now directly one hot encode your output class for each input. For input of 9,20 you have 1 one hot encoded output vector.

Earlier you were thinking that for one input of length 9 and feature of 20 output should be of length 9, and first 8 output's should be 0 and 9th should be the class.

This won't be needed anymore as 1 input of size (9,20) will give 1 output.

Let me know if more explanation is needed, all the best :)

Community
  • 1
  • 1
Vikash Singh
  • 13,213
  • 8
  • 40
  • 70