0

I'm trying to convert a PyTorch VAE to onnx, but I'm getting: torch.onnx.symbolic.normal does not exist

The problem appears to originate from a reparametrize() function:

    def reparametrize(self, mu, logvar):
        std = logvar.mul(0.5).exp_()
        if self.have_cuda:
             eps = torch.normal(torch.zeros(std.size()),torch.ones(std.size())).cuda()
        else:
           eps = torch.normal(torch.zeros(std.size()),torch.ones(std.size()))
        return eps.mul(std).add_(mu)

I also tried:

eps = torch.cuda.FloatTensor(std.size()).normal_()

which produced the error:

    Schema not found for node. File a bug report.
    Node: %173 : Float(1, 20) = aten::normal(%169, %170, %171, %172), scope: VAE 
    Input types:Float(1, 20), float, float, Generator

and

eps = torch.randn(std.size()).cuda()

which produced the error:

    builtins.TypeError: i_(): incompatible function arguments. The following argument types are supported:
    1. (self: torch._C.Node, arg0: str, arg1: int) -> torch._C.Node
    Invoked with: %137 : Tensor = onnx::RandomNormal(), scope: VAE, 'shape', 133 defined in (%133 : int[] = prim::ListConstruct(%128, %132), scope: VAE) (occurred when translating randn)

I am using cuda.

Any thoughts appreciated. Perhaps I need to approach the z/latent differently for onnx?

NOTE: Stepping through, I can see that it's finding RandomNormal() for torch.randn(), which should be correct. But I don't really have access to the arguments at that point, so how can I fix it?

jbm
  • 1,248
  • 10
  • 22

1 Answers1

1

In very short, the code bellow may work. (at least in my environment, it worked w/o errors).

It seems that .size() operator might return variable, not constant, so it causes error for onnx compilation. (I got the same error when changed to use .size())

import torch
import torch.utils.data
from torch import nn
from torch.nn import functional as F



IN_DIMS = 28 * 28
BATCH_SIZE = 10
FEATURE_DIM = 20

class VAE(nn.Module):
    def __init__(self):
        super(VAE, self).__init__()

        self.fc1 = nn.Linear(784, 400)
        self.fc21 = nn.Linear(400, FEATURE_DIM)
        self.fc22 = nn.Linear(400, FEATURE_DIM)
        self.fc3 = nn.Linear(FEATURE_DIM, 400)
        self.fc4 = nn.Linear(400, 784)

    def encode(self, x):
        h1 = F.relu(self.fc1(x))
        return self.fc21(h1), self.fc22(h1)

    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5*logvar)
        eps = torch.randn(BATCH_SIZE, FEATURE_DIM, device='cuda')
        return eps.mul(std).add_(mu)

    def decode(self, z):
        h3 = F.relu(self.fc3(z))
        return torch.sigmoid(self.fc4(h3))

    def forward(self, x):
        mu, logvar = self.encode(x)
        z = self.reparameterize(mu, logvar)
        recon_x = self.decode(z)

        return recon_x

model = VAE().cuda()

dummy_input = torch.randn(BATCH_SIZE, IN_DIMS, device='cuda')
torch.onnx.export(model, dummy_input, "vae.onnx", verbose=True)
Yuki Hashimoto
  • 1,013
  • 7
  • 19
  • Excellent, thank you! What a bizarre problem—mind you, I see now from the docs that `torch.size()` does return type "torchSize"... So, that got me past ONNX... now CoreML is complaining about RandonNormal. Haha... What a fiasco. – jbm Feb 15 '19 at 16:47