I learned that if one needs to implement dct on a image of size (H, W), one needs a matrix A that is of size (8, 8), and one needs to use this A to compute with a (8, 8) region F on the image. That means if the image array is m, one needs to compute m[:8, :8] first, and then m[8:16, 8:16], and so on.
How could I implement this dct when input image size is not a scale of 8. For example, when image size is (12, 12) that cannot hold two (8, 8) windows, how could I implement dct ? I tried opencv and found that opencv can cope with this scenario, but I do not know how it implemented it.