1

I am having a problem printing from within a pycuda kernel: the printf() function prints nothing.

There was a similar question posted here by @username_4567 and also an example given here, which @harrism pointed to in his answer. However, I have implemented the code in the pycuda example and nothing gets printed (though with no errors). I am guessing the problem is that I am using a Kepler GPU and the example specifies that it only works on Fermi devices.

Does anybody know how I can print information from within pycuda kernels using my Kepler GPU? There's probably a work-around I could design, which copies any data I want to print on to the CPU and then print via Python but I'd prefer to avoid that!

I have searched the web for anyone having the same problem but I found nothing.

I am using Python 3.5 (Anaconda build), Spyder as an IDE (launched from terminal) and an iMac with El Capitan. GPU is GeForce GT 755M.

Community
  • 1
  • 1

1 Answers1

3

There is nothing wrong with the code in example you are trying to use, and it is perfectly suitable for use on a Kepler GPU. The problem is that the CUDA runtime uses a buffer for printf output which is only periodically flushed by the driver, and which needs to be triggered by any one of several API calls.

I am guessing you are testing this in an interactive python shell. In that case you should add an explicit synchronization call to the code:

import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule

mod = SourceModule("""
    #include <stdio.h>

    __global__ void say_hi()
    {
      printf("I am %d.%d\\n", threadIdx.x, threadIdx.y);
    }
    """)

func = mod.get_function("say_hi")
func(block=(4,4,1))

# Flush context printf buffer
cuda.Context.synchronize()

Alternatively, if you add a shebang line and run the unmodified code from a command prompt:

$ cat hello_cuda.py 
#!/usr/bin/env python
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule

mod = SourceModule("""
    #include <stdio.h>

    __global__ void say_hi()
    {
      printf("I am %d.%d\\n", threadIdx.x, threadIdx.y);
    }
    """)

func = mod.get_function("say_hi")
func(block=(4,4,1))

$ ./hello_cuda.py 
I am 0.0
I am 1.0
I am 2.0
I am 3.0
I am 0.1
I am 1.1
I am 2.1
I am 3.1
I am 0.2
I am 1.2
I am 2.2
I am 3.2
I am 0.3
I am 1.3
I am 2.3
I am 3.3

it will also work. In the latter case, it is the context cleanup triggered by the pycuda.autoinit module which automagically flushes the buffer.

talonmies
  • 70,661
  • 34
  • 192
  • 269
  • hi @talonmies, many thanks for taking the time to reply. I ran the first block of code above - the example with the cuda.Context.synchronize() line applied - but I still got nothing printed out. However, when I ran the same code from terminal it did print from the kernel - the "I am x.y" lines in your second block above. – Greg Fisher Aug 11 '16 at 16:03
  • on further investigation, something strange is going on with my iPython and Python consoles: if I run the example code in a Python shell (including "cuda.Context.synchronize()") it prints out as it should. However, when I run it in an iPython shell it sends the output to "Kernel 1" (a tab within the Console window rather than in the iPython shell), which I presume is associated with the iPython shell. So this is looking more like an iPython issue. If you have any more thoughts, I'd gladly hear them, otherwise I will switch to using Python shells rather than iPython. – Greg Fisher Aug 11 '16 at 16:35