I want to fine tune some generative diffusion model (DDPM), lets say trained on ImageNet (NOT Stable Diffusion which is text2img), to some other data like CelebA or CIFAR-10. I wonder two things:
- does my pretrained model needs to be unconditional? or can it be conditioned on ImageNet classes but can be tuned on unconditional CelebA or with classes from CIFAR-10?
- maybe trivial but do image sizes must agree between ImageNet data and my target data from pretraining? aka if I have CIFAR-10 32x32 I need model trained on the same resolution of ImageNet?
So far I found some models from OpenAI trained on ImageNet in both ways but haven't tried to that yet. Some theoretical input would be greatly appreciated