0

I am working on a project that requires me to compare a print bibliography (of nearly 14000 items) against a digital catalog/database. Would it be possible to 1. scan the bibliography, 2. run it through an OCR program, 3. (optional) convert the scanned file into a spreadsheet 4. compare that information against a library catalog, i.e., see if the items in the bibliography appear in the catalog collection?

This would greatly reduce the time this project currently requires.

1 Answers1

0

Yes, it is definitely possible.

This is an interesting project, unusual, and requires a bit of MacGyver-ing.

The steps you described are correct. Keep in mind that for maximum automation plan for OCR result to be not perfect, so your search algorithm should allow some flexibility and fuzziness, enough to accommodate occasional mistakes from OCR, but specific enough not to cause false positives.

Ilya Evdokimov
  • 1,374
  • 11
  • 14