I want to convert frames from YUV420p format(or something like that) to ABGR format on the fly, and put the result frames in video memory as textures.
There are two ways I can think about now:
- Let each channel be a source texture, and render to another texture.
- Do it "normally" in a compute shader.
I don't quite understand the rules in the GPU. As in my card there are 720 shader cores, 36 texture units, and 16 output units. Does it mean within each cycle, I can at most sampling 40 texture and output 16 pixels, while I can execute 720 shader operations? So if I use method 1, I will be constrained to that 16 pixels output even if I only use 2 or 3 operations for each pixel? If I use method 2, does it mean as long as I can convert one pixel within 45 cycles, it will be faster than using method 1?