I'm trying to use the Pokemon finetuning notebook, which uses the Pokemon BLIP captions dataset; see the GitHub from the Lambda Labs examples repo; the training code is in the justinpinkney/stable-diffusion code base. I want to fine-tune Stable Diffusion on the MuMu dataset of album covers.
I have a (N, 512, 512, 3) numpy array of images and a (N) list of caption strings. The original code base works with a <class 'datasets.arrow_dataset.Dataset'> object, so I attempt to convert my dataset to this format using datasets.Dataset.from_dict() within hf_dataset() in ldm/data/simple.py:
img_dict = {}
for i in range(len(img_tensor)):
img_dict[i] = { 'image': img_tensor[i], 'text': img_captions[i] }
from datasets.Dataset import from_dict
ds = from_dict(img_dict)
This produces a huge error traceback ending in:
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: arrays to be concatenated must be identically typed, but list<item: list<item: uint8>> and string were encountered.
I think the problem is that img_tensor[i] is a 2D array (a list of lists of uint8 entries) and img_captions[i] is a string. How can I convert my data to a datasets.arrow_dataset.Dataset object?