0

I am trying to implement a CTC loss with keras for my simplified neural network:

  
def ctc_lambda_func(args):
    y_pred, y_train, input_length, label_length = args
 
    return K.ctc_batch_cost(y_train, y_pred, input_length, label_length)


x_train = x_train.reshape(x_train.shape[0],20, 10).astype('float32')

input_data = layers.Input(shape=(20,10,))
x=layers.Convolution1D(filters=256, kernel_size=3,  padding="same", strides=1, use_bias=False ,activation= 'relu')(input_data)
x=layers.BatchNormalization()(x)
x=layers.Dropout(0.2)(x)

x=layers.Bidirectional (LSTM(units=200 , return_sequences=True)) (x)
x=layers.BatchNormalization()(x)
x=layers.Dropout(0.2)(x)


y_pred=outputs = layers.Dense(5, activation='softmax')(x)
fun = Model(input_data, y_pred)
# fun.summary()

label_length=np.zeros((3800,1))
input_length=np.zeros((3800,1))

for i in range (3799):
    label_length[i,0]=4
    input_length[i,0]=5 
  
y_train = np.array(y_train)
x_train = np.array(x_train)
input_length = np.array(input_length)
label_length = np.array(label_length) 

  
loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred, y_train, input_length, label_length])
model =keras.models.Model(inputs=[input_data, y_train, input_length, label_length], outputs=loss_out)
model.compile(loss={'ctc': lambda y_train, y_pred: y_pred}, optimizer = 'adam')
model.fit(x=[x_train, y_train, input_length, label_length],  epochs=10, batch_size=100)

We have y_true (or y_train) with (3800,4) dimension, because of that I put label_length=4 and input_length=5 (+1 for blank)

I face this error :

ValueError: Input tensors to a Model must come from `tf.keras.Input`. Received: [[0. 1. 0. 0.]
 [0. 1. 0. 0.]
 [0. 1. 0. 0.]
 ...
 [1. 0. 0. 0.]
 [1. 0. 0. 0.]
 [1. 0. 0. 0.]] (missing previous layer metadata).

y_true is like this:

 [[0. 1. 0. 0.]
 [0. 1. 0. 0.]
 ...
 [1. 0. 0. 0.]
 [1. 0. 0. 0.]
 [1. 0. 0. 0.]]

what is my problem?

1 Answers1

0

You misunderstood the lengths. It is not the number of label classes, it is the actual length of the sequences. CTC can only be used in situations where the number of the target symbols is smaller than the number of input states. Technically, the number of inputs and outputs is the same, but some of the outputs are the blanks. (This typically happens in speech recognition where you have plenty of input signal windows and reletively few fonemes in the ouput.)

Assuming you must have padded the inputs and output to have them in a batch:

  • input_length shoud contain for each item in the batch, how many inputs are actually valid, i.e., not padding;

  • label_length should contain how many non-blank labels should the model produce for each item in the batch.

Jindřich
  • 10,270
  • 2
  • 23
  • 44
  • thank you for your answer. you mean that for example if we have these: y_train=[1,2,3,4,5,6,7,8,9,10] x_train=[0,1,2] then we should say that input_length is 10 and label_length is 3 ?? – Parisa Zaheri Oct 29 '20 at 14:21
  • If x is the output and y the input, then yes. – Jindřich Oct 29 '20 at 14:53
  • yes I wrote them wrong. I wanted to say x_train=[1,2,3,4,5,6,7,8,9,10] and y_train=[0,1,2] . one other question: lebel_length and input_length are a number or an array which has for example 20 rows that each row is for one input array ? – Parisa Zaheri Oct 29 '20 at 15:20
  • It is an array with one number per each item in the training batch. – Jindřich Oct 29 '20 at 15:29