1

I'm writing a PDF to text solution using OCR in Golang.

The libraries I employed are Gosseract and Go-Fitz

The program works until I'm trying to load an image from memory with Gosseract:

func ProcessDoc(file []byte) (string, error) {
    var text strings.Builder

    client := gosseract.NewClient()

    doc, err := fitz.NewFromMemory(file)

    if err != nil {
        log.Println(err)
        return "", nil
    }

    for n := 0; n < doc.NumPage(); n++ {
        img, err := doc.Image(n)

        if err != nil {
            log.Println(err)
            return "", err
        }

        buf := new(bytes.Buffer)

        err = jpeg.Encode(buf, img, nil)

        if err != nil {
            log.Println(err)
            return "", err
        }

        client.SetImageFromBytes(buf.Bytes())
        
        res, err := client.Text()
        
        if err != nil {
            return "", err
        }

        text.WriteString(res)
    }
    return text.String(), nil
}

Then I get this error:

JPEG parameter struct mismatch: library thinks size is 624, caller expects 656
Error in pixReadStreamJpeg: internal jpeg error
Error in pixReadMemJpeg: pix not read
Error in pixReadMem: jpeg: no pix returned

After a lot of searching, I learned there was the possibility of libleptonica or mupdf using different versions of jpeglib.h. But there's only one instance of that file in the whole system.

I should also note that I compiled libjpeg from source and then libmupdf and libleptonica to use that version of libjpeg to avoid any form of conflicts but it still returns the Struct Mismatch error.

halfer
  • 19,824
  • 17
  • 99
  • 186
dharmi_kid
  • 19
  • 4

1 Answers1

0

Are you compiling mupdf from source?

By default mupdf includes it's own version of libjpeg - it is possible that mupdf compiled against it's own version and libleptonica against the system version.

JosephH
  • 37,173
  • 19
  • 130
  • 154
  • Hi, Yes I'm Compiling From Source And I Put The Source Files Of The Libjpeg I Manually Compiled And Installed Into The Thirdparty Folder Of Mupdf, So I'd Think It's Using The Same One As The System. – dharmi_kid Oct 27 '22 at 16:39