1

I am using Google Disc OCR API to recognize text in pictures. But the problem is that it reads any text, even very, very small, microscopic text. Can I somehow set a threshold so that very small letters are ignored? I don't need very small text

I use this code in Google Apps Script:

 if (request.parameters.url != undefined && request.parameters.url != "") {
    var imageBlob = UrlFetchApp.fetch(request.parameters.url).getBlob();
    var resource = {
          title: imageBlob.getName(),
          mimeType: imageBlob.getContentType()
    };
    var options = {
        ocr: true
    };
    var docFile = Drive.Files.insert(resource, imageBlob, options);
    var doc = DocumentApp.openById(docFile.id);
    var text = doc.getBody().getText().replace("\n", "");
    Drive.Files.remove(docFile.id);
    return ContentService.createTextOutput(text);
 }else {
    return ContentService.createTextOutput("request error");
 }
}```
Влад
  • 21
  • 2

1 Answers1

1

There is no way to add a threshold in OCR as parameter, but there is a workaround you can do.

You can try reading the font sizes of the children of the document it created instead of the source material.

function doOCR() {
  // JT digital inspiration (font 19 in document)
  // tech à la carte (font 9 in document)
  var image = UrlFetchApp.fetch('http://img.labnol.org/logo.png').getBlob();

  var file = {
    title: 'OCR File',
    mimeType: 'image/png'
  };
  
  var docFile = Drive.Files.insert(file, image, {ocr: true});
  var doc = DocumentApp.openById(docFile.id).getBody();
  var numElements = doc.getNumChildren();

  // Traverse all children
  for (var i = 0; i < numElements; ++i ) {
    var element = doc.getChild(i);
    var fontSize = element.getFontSize();
    var textValue = element.asText().getText();
    var type = element.getType();
    // Add condition, if font size is less than your threshold
    // There are other children that have fontSize but doesn't have textValue, skip them
    if( type == DocumentApp.ElementType.PARAGRAPH && textValue != "" && fontSize > 10){
      Logger.log(textValue);
    }
  }
}

You can also customize to skip specific font sizes, just fiddle with the condition.

Input:

input

Output (Document):

output

Output (Console):

output

Reference:

NightEye
  • 10,634
  • 2
  • 5
  • 24