0

I'm trying to extract text from image using the Donut Model which is an Image Parser. It seems that the input image is not in the proper format.

I'm getting an error that says: RuntimeError: Input type (float) and bias type (c10::BFloat16) should be the same on this line:

output = model.inference(image=image, prompt="<s_cord-v2>")

Here is my entire code:

    from donut import DonutModel 
    from PIL import Image 
    import torch 

    model = DonutModel.from_pretrained("naver-clova-ix/donut-base- 
    finetuned-cord-v2") 

    if torch.cuda.is_available():
        model.half()      
        device = torch.device("cuda")      
        model.to(device)  
    else:      
        model.encoder.to(torch.bfloat16) model.eval()  

    image = Image.open("testfolder/test1.jpg").convert("RGB") 
    output = model.inference(image=image, prompt="<s_cord-v2>") 
    output

I understand that image is not in the right format, but how would I go about fixing that?

0 Answers0