CNTK: A loss function for sequence to sequence processing

Question

I'm doing a sequence-to-sequence model for phonemes alignment. Specifically my train data look like paired sequences (phoneme - length), where phoneme is a one-hot vector, and length is a float. So I want to feed the model with a phoneme sequence and get a corresponding length sequence.

My network is generally built like these:

model = Sequential(
    EmbeddingLayer{embeddingSize} : 
    RecurrentLSTMLayerStack {lstmDims} :
    LinearLayer{1}
)

The LinearLayer{1} should do a conversion from lstmDims to 1 if I get things right. So when I feed the model with a sequence of length N, I should get a resulting sequence of length N as well.

Now I want to set up a proper loss function, which I think should be an average difference between the elements of a known result sequence and the model output. Averaging should be done through the time axis, so that sequences of different lengths could be managed.

I was planning to do something like

objectives = Input(1) #actually a sequence here as stated in the reader
result = model(features)
errs = Abs(objectives - result)
loss_function = ReduceMean(errs)
criterionNodes  = (loss_function)

but in Reduction Operations it's explicitly stated that

These operations do not support reduction over sequences. Instead, you can achieve this with a recurrence.

I'm not sure how to use recurrence for my task. And I'm also not sure if the whole concept is fine.

Nikos Karampatziakis · Accepted Answer · 2017-01-12T02:00:11.253

1

You need two recurrences that are not too complicated (for the second one we use a "builtin" operation whose implementation is in the cntk.core.bs file):

sum = errs + PastValue (0, sum, defaultHiddenActivation=0)
count = BS.Loop.Count(errs)
loss_function = sum / count

edited Jan 12 '17 at 02:00

answered Jan 11 '17 at 18:39

Nikos Karampatziakis

2,050
9
15

Thanks! That's seem to be what I was looking for! I was also thinking of using `SumElements` instead of using recurrence for `sum` part, but it's not clear if it's OK (SumElements has no documentation available). – Mikhail Jan 13 '17 at 01:48

score 0 · Answer 2 · answered Jan 10 '17 at 07:05

0

There is a specific Sequence to Sequence tutorial in GitHub that walks you through a data similar to yours. You can look into how the network is defined.

https://github.com/Microsoft/CNTK/blob/master/Tutorials/CNTK_204_Sequence_To_Sequence.ipynb

answered Jan 10 '17 at 07:05

Sayan Pathak

870
4
7

From what I see in the tutorial, it uses `cross_entropy_with_softmax` and `classification_error` functions which are for classification. In my case there's no classification. I'm looking towards getting a sequence of floats that is as near as possible to the training sequence. – Mikhail Jan 10 '17 at 07:55
if you do not have a classification error; you can define your loss function say myLoss. Then pass an alias of myLoss ot trainer CNTK.ops.alias(myLoss). We are taking a closer look at this API currently. – Sayan Pathak Jan 10 '17 at 23:12
Yes, I understand that I need some other function, but I don't understand how to write it so that it would work over the dynamic (sequence) axis. That's what my question is about. Please see the part of my question starting with "I was planning to do something like". I think I would be fine with ReduceMean, but it does not work over sequence axis. – Mikhail Jan 10 '17 at 23:55

CNTK: A loss function for sequence to sequence processing

2 Answers2