I'm trying to extract text from image using the Donut Model which is an Image Parser. It seems that the input image is not in the proper format.
I'm getting an error that says:
RuntimeError: Input type (float) and bias type (c10::BFloat16)
should be the same
on this line:
output = model.inference(image=image, prompt="<s_cord-v2>")
Here is my entire code:
from donut import DonutModel
from PIL import Image
import torch
model = DonutModel.from_pretrained("naver-clova-ix/donut-base-
finetuned-cord-v2")
if torch.cuda.is_available():
model.half()
device = torch.device("cuda")
model.to(device)
else:
model.encoder.to(torch.bfloat16) model.eval()
image = Image.open("testfolder/test1.jpg").convert("RGB")
output = model.inference(image=image, prompt="<s_cord-v2>")
output
I understand that image is not in the right format, but how would I go about fixing that?