2

I exported a trained LSTM neural network from this example from Matlab to ONNX. Then I try to run this network with ONNX Runtime C#. However, it looks like I am doing something wrong and the network does not remember its state on the previous step.

The network should respond to the input sequences with the following outputs:

  • Input: [ 0.258881980200294 ]; Output: [ 0.311363101005554 ]

  • Input: [ 1.354147904050896 ]; Output: [ 1.241550326347351 ]

  • Input: [ 0.258881980200294, 1.354147904050896 ]; Output: [ 0.311363101005554, 1.391810059547424 ]

The first two examples are the sequences that consist of only one element. The last example is the sequence of two elements. These outputs are calculated in Matlab. I reset the network in Matlab between executing it with each new sequence.

Then I try to run the same network using ONNX Runtime. This is my C# code:

using Microsoft.ML.OnnxRuntime;
using Microsoft.ML.OnnxRuntime.Tensors;
using System;
using System.Collections;
using System.Collections.Generic;

namespace OnnxTest
{
    public sealed class OnnxRuntimeTest
    {
        public OnnxRuntimeTest(ILogger logger)
        {
            this.logger = logger ?? throw new ArgumentNullException(nameof(logger));
        }

        private const string modelPath = @"E:\Documents\MATLAB\NeuralNetworkExport\onnx_lstm_medic.onnx";
        private readonly ILogger logger;

        public void Run()
        {
            using (var session = new InferenceSession(modelPath))
            {
                // Input values from the example above:
                var input1 = GenerateInputValue(0.258881980200294f);
                var input2 = GenerateInputValue(1.35414790405090f);

                // I create a container to push the first value:
                var container = new List<NamedOnnxValue>() { input1 };

                //Run the inference
                using (var results = session.Run(container))  
                {
                    // dump the results
                    foreach (var r in results)
                    {
                        logger.Log(string.Format("Output for {0}", r.Name));
                        logger.Log(r.AsTensor<float>().GetArrayString());

                        // Outputs 0,3113631 - as expected
                    }
                }


                // The same code to push the second value:
                var container2 = new List<NamedOnnxValue>() { input2 };

                using (var results = session.Run(container2)) 
                {
                    // dump the results
                    foreach (var r in results)
                    {
                        logger.Log(string.Format("Output for {0}", r.Name));
                        logger.Log(r.AsTensor<float>().GetArrayString());

                        // Outputs 1,24155 - as though this is the first input value
                    }
                }

            }
        }

        private NamedOnnxValue GenerateInputValue(float inputValue)
        {
            float[] inputData = new float[] { inputValue };
            int[] dimensions = new int[] { 1, 1, 1 };
            var tensor = new DenseTensor<float>(inputData, dimensions);
            return NamedOnnxValue.CreateFromTensor("sequenceinput", tensor);
        }

As you can see, the second session run results with 1,24155 instead of the expected value (1.391810059547424) as though the network is still in its initial state. It looks like I do not save the state of LSTM-network, but I can't find how to do this in the documentation.

So, does anyone know how to make LSTM keep its state?

1 Answers1

0

One way to go would be to create your inputs in a sequence, and the LSTM model inputs them one by one, accumulating its internal state in a single inference session. For example, here I have an LSTM that accepts an input of dimension [batch_size, sequence_size, input_size], where 1 is the input size I used in this case. The batch and sequence size are not defined in the constructor, but ONNX learns what they are when tracing the model.

def __init__(self, config):
    super().__init__()
    self.output_size = config['output_size']
    self.n_layers = config['num_lstm_layers']
    self.hidden_dim = config['lstm_hidden_dim']
    
    # LSTM layers
    self.lstm = nn.LSTM(config['input_size'], 
                        self.hidden_dim, 
                        self.n_layers, 
                        dropout=config['dropout_prob'], 
                        batch_first=True)
    
    # dropout layer
    self.dropout = nn.Dropout(config['dropout_prob'])        
    # linear layer
    self.fc = nn.Linear(self.hidden_dim, config['output_size']) 

Here's an example where the ONNX model is setup to accept a variable batch size with the dynamic_axes option, but the other dimensions could also be specified.

with torch.no_grad():
    net.eval()
    torch_chunk = torch.tensor(chunk, dtype=torch.float32).unsqueeze(1).unsqueeze(0)
    h = net.init_hidden(1, 'cpu')
    h = tuple([each.data for each in h])    
    torch.onnx.export(net,
                      (torch_chunk, h),
                      'traced_network.onnx',
                      dynamic_axes={'input': [0], 'h0': [1], 'c0': [1], 'hn': [1], 'cn': [1], 'output': [0]},
                      input_names=['input', 'h0', 'c0'],
                      output_names=['output', 'hn', 'cn'])
    onnx_model = onnx.load('traced_network.onnx')
    onnx.checker.check_model(onnx_model)

In the C# layer, I needed to create the hidden layers with the correct (dynamic) size.

    public float[] Run(Tensor<float> input) 
    {
        // package the inputs into named values to coincide with the model that
        // was traced and created in python.
        // the hidden dimensions depend on how big the batch is.
        var batch_size = input.Dimensions[0];
        var onnx_input = new List<NamedOnnxValue>
        {
            NamedOnnxValue.CreateFromTensor<float>("input", input),
            NamedOnnxValue.CreateFromTensor<float>("h0", GetHiddenTensor(batch_size)),
            NamedOnnxValue.CreateFromTensor<float>("c0", GetHiddenTensor(batch_size)),
        };

        var results = _inference_session.Run(onnx_input);
        // the output is the inferred values, one for each input,
        // and the hidden vectors, which we don't need for the inference
        return results.First().AsEnumerable<float>().ToArray();
    }
premes
  • 363
  • 2
  • 8