"Hello World" CTC (Connectionist Temporal Classification) model

Question

I have created the following Python program, which, as far as I understand CTC, should be a valid CTC-based model, as well as training data. The best documentation I can find is CNTK_208_Speech_CTC Tutorial, which is what I've based this on. The program is as simple as I could make it, and it relies only on numpy and CNTK, and generates data itself.

When I run this, I get the following error:

Validating --> ForwardBackward2850 = ForwardBackward (LabelsToGraph2847, StableSigmoid2703) : [5 x labelAxis1], [5 x inputAxis1] -> []

RuntimeError: The Matrix dimension in the ForwardBackwardNode operation does not match.

This seems to be the same issue from this ticket: https://github.com/Microsoft/CNTK/issues/2156

Here is the Python program:

# cntk_ctc_hello_world.py
#
# This is a "hello world" example of using CTC (Connectionist Temporal Classification) with CNTK.
#
# The input is a sequence of vectors of size 17. We use 17 because it's easy to spot that number in 
# error messages. The output is a string of codes, each code being one of 4 possible characters from
# our alphabet that we'll refer to here as "ABCD", although they're actually just represented
# by the numbers 0..3, which is typical for classification systems. To make the setup of training data
# trivial, we assign the first four elements of our 17-dimension input vector to the four characters
# of our alphabet, so that the matching is:
# 10000000000000000  A
# 01000000000000000  B
# 00100000000000000  C
# 00010000000000000  D
# In our input sequences, we repeat each code three to five times, followed by three to five codes
# containing random noise. Whether it's repeated 3,4, or 5 times, is random for each code and each
# spacer. When we emit one of our codes, we fill the first 4 values with the code, and the remaining
# 13 values with random noise.
# For example:
# Input:  AAA-----CCCC---DDDDD
# Output: ACD

import cntk as C
import numpy as np
import random
import sys

InputDim = 17
NumClasses = 4 # A,B,C,D
MinibatchSize = 100
MinibatchPerEpoch = 50
NumEpochs = 10
MaxOutputSeqLen = 10 # ABCDABCDAB

inputAxis = C.Axis.new_unique_dynamic_axis('inputAxis')
labelAxis = C.Axis.new_unique_dynamic_axis('labelAxis')
inputVar = C.sequence.input_variable((InputDim), sequence_axis=inputAxis, name="input")
labelVar = C.sequence.input_variable((NumClasses+1), sequence_axis=labelAxis, name="labels")

# Construct an LSTM-based model that will perform the classification
with C.default_options(activation=C.sigmoid):
    classifier = C.layers.Sequential([
        C.layers.For(range(3), lambda: C.layers.Recurrence(C.layers.LSTM(128))),
        C.layers.Dense(NumClasses + 1)
    ])(inputVar)

criteria = C.forward_backward(C.labels_to_graph(labelVar), classifier, blankTokenId=NumClasses, delayConstraint=3)
err = C.edit_distance_error(classifier, labelVar, squashInputs=True, tokensToIgnore=[NumClasses])

lr = C.learning_rate_schedule([(3, .01), (1,.001)], C.UnitType.sample)
mm = C.momentum_schedule([(1000, 0.9), (0, 0.99)], MinibatchSize)
learner = C.momentum_sgd(classifier.parameters, lr, mm)
trainer = C.Trainer(classifier, (criteria, err), learner)

# Return a numpy array of 17 elements, for this code
def make_code(code):
    a = np.zeros(NumClasses)                  # 0,0,0,0
    v = np.random.rand(InputDim - NumClasses) # 13x random
    a = np.concatenate((a, v))
    a[code] = 1
    return a

def make_noise_code():
    return np.random.rand(InputDim)

def make_onehot(code):
    v = np.zeros(NumClasses+1)
    v[code] = 1
    return v

def gen_batch():
    x_batch = []
    y_batch = []
    for mb in range(MinibatchSize):
        yLen = random.randint(1, MaxOutputSeqLen)
        x = []
        y = []
        for i in range(yLen):
            code = random.randint(0,3)
            y.append(make_onehot(code))
            xLen = random.randint(3,5) # Input is 3 to 5 repetitions of the code
            for j in range(xLen):
                x.append(make_code(code))
            spacerLen = random.randint(3,5) # Spacer is 3 to 5 repetitions of noise
            for j in range(spacerLen):
                x.append(make_noise_code())
        x_batch.append(np.array(x, dtype='float32'))
        y_batch.append(np.array(y, dtype='float32'))
    return x_batch, y_batch

#######################################################################################
# Dump first X/Y training pair from minibatch
#x, y = gen_batch()
#print("\nx sequence of first sample of minibatch:\n", x[0])
#print("\ny sequence of first sample of minibatch:\n", y[0])
#######################################################################################

progress_printer = C.logging.progress_print.ProgressPrinter(tag='Training', num_epochs=NumEpochs)

for epoch in range(NumEpochs):
    for mb in range(MinibatchPerEpoch):
        x_batch, y_batch = gen_batch()
        trainer.train_minibatch({inputVar: x_batch, labelVar: y_batch})

    progress_printer.epoch_summary(with_metric=True)

Did you try any of the suggestions at the github issue you linked to (2156)? — Nikos Karampatziakis, Sep 09 '17 at 00:30
@BenHarper Did you mention to solve the problem? I facing the same issue now and i'm not sure how to debug this. — snowflake, Oct 20 '18 at 08:47
After reading through the cntk issues, i realised that cntk expects the sequence length of the labels and the network_outputs to be the same. That's crazy!! — snowflake, Oct 20 '18 at 09:06
Thanks @snowflake! Honestly, I switched to pytorch a long time ago, so I don't even have the code around anymore to see whether your discovery works for me. — Ben Harper, Oct 21 '18 at 20:03
Damnnnnn, seems, like cntk is really dead. How does pytorch compare to cntk? I heard lots of good things about it! — snowflake, Oct 22 '18 at 13:00
PyTorch is incredible - it's powerful, yet very easy to debug your code. You'll never look back. — Ben Harper, Oct 25 '18 at 09:12

score 0 · Answer 1 · answered May 02 '19 at 15:37

For those who are facing this error, there are two points to take note of:

(1) The data provided to labels sequence tensor to labels_to_graph must have the same sequence length as the data coming out from network output at runtime.

(2) If during the model building you change the dynamic sequence axis of input sequence tensor (e.g. stride in sequential axis), then you must call reconcile_dynamic_axes on your labels sequence tensor with the network_output as the second argument to the function. This tells CNTK that labels have the same dynamic axis as the network.

Adhering to these 2 points will allow forward_backward to run.

"Hello World" CTC (Connectionist Temporal Classification) model

1 Answers1