4

I was trying to run a code that is based off the following link

https://documen.tician.de/pycuda/tutorial.html

Running code in this link turned out to be fine.

This is my version with similar definitions. Note that I was running under engine context since I want to run an engine.execute function.

import pycuda.driver as cuda 
import pycuda.autoinit 
import tensorrt as trt 

import numpy as np
from keras.datasets import mnist 

dims = (1, 28, 28) 
dims2 = (1, 1, 10) 
batch_size = 1000 

nbytes = batch_size * trt.volume(dims) * np.dtype(np.float32).itemsize 
nbytes2 = batch_size * trt.volume(dims2) * np.dtype(np.float32).itemsize 

self.d_src  = cuda.mem_alloc(nbytes) 
self.d_dst = cuda.mem_alloc(nbytes2) 

bindings = [int(self.d_src), int(self.d_dst)] 

(x_train, y_train), (x_test, y_test) = mnist.load_data()

img_h = x_test.shape[1]
img_w = x_test.shape[2]

x_test = x_test.reshape(x_test.shape[0], 1, img_h, img_w)

x_test = x_test.astype('float32')
x_test /= 255
num_test = x_test.shape[0]

output_size = batch_size * trt.volume(dims2)

y = np.empty((num_test,output_size), np.float32)

for i in range(0, num_test, batch_size): 
     x_part = x_test[i : i + batch_size] 
     y_part = y[i : i + batch_size] 
     cuda.memcpy_htod(self.d_src, x_part) 

     cuda.memcpy_dtoh(y_part, self.d_dst) 

However it failed at the memcpydtoh, yet memcpyhtod worked.

File "a.py", line 164, in infer
    cuda.memcpy_dtoh(y_part, self.d_dst)
pycuda._driver.LogicError: cuMemcpyDtoH failed: invalid argument

Why is this the case? The definitions are similar to the code in the link.

macman
  • 91
  • 1
  • 7
  • 1
    How is that code helpful? I can't run it for myself. What is x_part and y_part? – talonmies Aug 08 '19 at 07:31
  • edited to include x & y definitions, although i initially thought you can use the x & y defined in the earlier link. – macman Aug 09 '19 at 01:13

1 Answers1

3

I have solved it anyway.

The device allocation needs to be different for x_part and y_part since their sizes are different.

So it works if I define output_size = trt.volume(dims2).

The error message isn't very helpful to begin with & made me think I inputted wrong arguments instead.

macman
  • 91
  • 1
  • 7
  • 3
    Hi, I have just recently started working in tensorrt and am just stuck at this exact same error. It would really help me out if you explain your answer in more detail like how just changing `output_size = batch_size * trt.volume(dims2)` to `output_size = trt.volume(dims2)` . I may just be asking naive question and I am also trying to grasp more from the documentation. – minhaj Apr 09 '20 at 19:49