macOS: How to access the Live Text OCR functionality from AppleScript/JXA?

Question

As of macOS Monterey it is possible to select text in images in Preview.

Is this OCR functionality available from AppleScript and/or JXA (JavaScript for Automation)?

In Script Editor.app File > Open dictionary... I selected the Preview.app and looked at the API for Standard Suite and Text Suite but there doesn't seem to be anything related to OCR. (The Text Suite apparently has to do with drawing text on pictures and not with text extraction.)

I have also searched for text recognition actions in Automator.app but didn't see anything suitable.

Preview’s dictionary is a fake: it’s just the default terminology provided by the Cocoa Scripting framework, and doesn’t actually do anything beyond window management. Your best bet is to see if the OCR functionality is available as a system framework which you can access via the ObjC bridge. — foo, Feb 14 '22 at 07:36
@foo: thanks for the hint! the Vision API does seem to be available as seen in this code sample (the Download contains the demo xcode project): https://developer.apple.com/documentation/vision/locating_and_displaying_recognized_text — ccpizza, Feb 14 '22 at 16:23

Stephen Kaplan · Accepted Answer · 2023-05-15T06:08:59.807

You can use AppleScriptObjC and the Vision framework to extract text from images. The overall process is to obtain image data, set up an image request of the desired type (in this case, VNRecognizeTextRequest), create a corresponding image handler, perform the request, and return the resulting text strings.

use framework "Vision"

on getImageText(imagePath)
    -- Get image content
    set theImage to current application's NSImage's alloc()'s initWithContentsOfFile:imagePath

     -- Set up request handler using image's raw data
    set requestHandler to current application's VNImageRequestHandler's alloc()'s initWithData:(theImage's TIFFRepresentation()) options:(current application's NSDictionary's alloc()'s init())
    
    -- Initialize text request
    set theRequest to current application's VNRecognizeTextRequest's alloc()'s init()
  
     -- Perform the request and get the results
    requestHandler's performRequests:(current application's NSArray's arrayWithObject:(theRequest)) |error|:(missing value)
    set theResults to theRequest's results()

    -- Obtain and return the string values of the results
    set theText to {}
    repeat with observation in theResults
        copy ((first item in (observation's topCandidates:1))'s |string|() as text) to end of theText
    end repeat
    return theText
end getImageText

on run (argv)
    if (count of argv) is 0 then error "Must provide an image path"
    getImageText(item 1 of argv)
end run

You can run this in Script Editor -- just make sure to update the file path.

Incredible! Thank you for sharing this! Works on ventura perfectly! — ccpizza, Mar 20 '23 at 01:45
how to rewrite this script so it will accept path as a parametr when run using terminal like this `osascript this_script.scpt` ? Thx! — Artem Bernatskyi, May 01 '23 at 19:55
@ArtemBernatskyi I've updated the answer with a run handler that shows how to do that. `argv` is the built-in reference to the list of arguments passed to the command. — Stephen Kaplan, May 15 '23 at 06:13
@StephenKaplan thx! Also I have found the repo where you can just download a binary to execute and no need to compile anything (https://github.com/xulihang/macOCR) . But remember - trust but verify :) — Artem Bernatskyi, May 16 '23 at 11:20
You can run the script with heredoc in a terminal. e.g. `$osascript - "filepath" < — Constantin Hong, Jul 24 '23 at 18:21

score 0 · Answer 2 · answered Jul 09 '22 at 19:56

0

There isn’t (yet?) a way to directly access OCR from AppleScript, but the workaround that I’m using is to use OwlOCR. This is an app that can capture text from the screen and output it to PDF, or plant text to the clipboard. Crucially for our purposes, it can also be controlled from the command line, and you can wrap those shell commands in an AppleScript “do shell script” command.

answered Jul 09 '22 at 19:56

jsm

111
1
14

Is OwlOCR using Apple's ootb livetext API or does it use it's own implementation, such as tesseract or similar? This distinction is crucial, specifically on latest M1-M2 models which have dedicated neural chips which speed up these operations. Tesseract as is can be easily scripted and wrapped into apple script. The original question is how to leverage the existing Apple's built-in API which is hardware-optimized and fine-tuned for Apple's neural engine chipsets. – ccpizza Jan 02 '23 at 14:04
1

Per their website, “the text recognition algorithm that is used has been developed by Apple and is a part of your macOS operating system and is only run on your device.” – jsm Jan 03 '23 at 19:39

macOS: How to access the Live Text OCR functionality from AppleScript/JXA?

2 Answers2