2

In this discussion of the runtime vs the driver API, it is said that

Primary contexts are created as needed, one per device per process, are reference-counted, and are then destroyed when there are no more references to them.

What counts as such references? And - does this not imply that, often, the primary context is supposed to be destroyed right after being used, repeatedly? e.g. you get the default device ID, then launch a kernel; what "references" remain? Surely it's not the integer variable holding the device id...

einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • This is interesting because in some of my runtime API codes, when I was inspecting them via the visual profiler, some of the codes would be shown as having one context per kernel launch, even if it was just a for loop with few other things. I never knew why, but this question explains it: in some cases the contexts where auto-deleted/auto-created continuously. So i've seen the effect you mention – Ander Biguri Apr 30 '20 at 10:55
  • The primary context will, under normal circumstances, never be destroyed by the runtime API unless you call cudaDeviceReset or the atexit() path is triggered. – talonmies Apr 30 '20 at 13:39
  • @talonmies: That sounds pretty reasonable (although a "reset" doesn't sound like it should destroy a context, only reset it) - but it seems to contradict the quoted passage. – einpoklum Apr 30 '20 at 13:43
  • 1
    No it doesn't contradict it. The reference counting is in the driver, and the lazy context establishment mechanism has "sticky" references which won't get cleaned up until you reset or atexit gets called – talonmies Apr 30 '20 at 14:03
  • @talonmies: That last comment sounds like the answer. – einpoklum Apr 30 '20 at 14:45

1 Answers1

4

None of the exact internal workings of the runtime API are documented and there is empirical evidence that they have subtly changed over time. That said, if you inspect the host code boilerplate the toolchain emits and run some host side traces, it is possible to infer how it works, and what follows is my understanding based on observations made in this way.

It is important to realize that primary context reference counting is an internal function within the driver and the "lazy context establishment" mechanism itself uses some internal API hooks which will either bind to an existing primary context created explicitly by the driver API (which increments the reference count) or create one itself if none is available and then bind to that context (which also increments the reference count). The routines which unbind from a primary context are registered via atexit and will trigger when the application exits or when cudaDeviceReset() is called.

This approach prevents the potential scenario you have posited whereby contexts are continuously destroyed when their reference count falls to zero and then recreated when another runtime API functional is called. That doesn't happen.

heapoverflow
  • 264
  • 2
  • 12
talonmies
  • 70,661
  • 34
  • 192
  • 269