OpenCL has device fission but not device fusion. Devices can share memory for efficiency but shaders can't be joined.
There are also some functions that can't synchronize between two GPUs yet:
- Atomic functions in kernels
- Prefetch command(which GPUs global cache?)
- clEnqueueAcquireGLObject(which GPU's buffer?)
- clCreateBuffer (which device memorry does it choose? we can't choose.)
- clEnqueueTask (where does this task go?)
You should partition the encoding work in two pieces and run on both GPUs. This may even need cross-fire to be disabled if drivers have problems with it. This shouldn't be harder than writing a GPGPU encoder.
But you may need to copy data to only one of the devices, then copy half of data to other GPU from that buffer, instead of passing through pci-e twice. The inter-GPU connection must be faster than pci-e.