1

I am trying to get the x and y coordinates of specific text on an image like this. On this image I am trying to detect where X:input Y:input is located which could be anywhere on future images. In this case I would expect it to be around 714, 164, 125, 32 (x, y, width height).

I tried to use Tesseract and Jimp

const worker = await Tesseract.createWorker();

await worker.loadLanguage("eng");
await worker.initialize("eng");

const convertedImage = await image
  .grayscale()
  .getBufferAsync(Jimp.MIME_PNG);

await worker.setParameters({ tessedit_char_whitelist: "XY012345678" });

const { data } = await worker.recognize(convertedImage);

But I am not sure if anything in data allows me to get the desired result. I am not aware of other libraries that might help me

  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Feb 21 '23 at 21:00
  • My input is `X:323 Y:528`. That text is visible on the image and I would like to get the coordinates of that text inside the image programmatically. When I open the image in a tool like IrfanView and select the text, I see something like `714, 164, 125, 32` which would roughly be my expected output of the code –  Feb 21 '23 at 21:05

1 Answers1

0

Updated response

Even with a contrast of 20%, the text was still not getting picked-up. Setting it to 10% worked.

import path from "path";
import Jimp from "jimp";
import { createWorker, PSM } from "tesseract.js";

const __dirname = path.resolve();

const main = async () => {
  const imagePath = path.join(__dirname, "image.png");
  const bounds = await getBoundingBox(imagePath, "X323Y528", "XY012345689");

  console.log("Bounds:", bounds); // { x: 719, y: 173, width: 116, height: 16 }
};

const getBoundingBox = async (imagePath, searchText, allowedCharacters) => {
  const worker = await createWorker();

  await worker.loadLanguage("eng");
  await worker.initialize("eng");

  await worker.setParameters({
    tessedit_char_whitelist: allowedCharacters,
    tessedit_pageseg_mode: PSM.SPARSE_TEXT,
  });

  const image = await Jimp.read(imagePath);
  const imageBuffer = await image
    .color([{ apply: "desaturate", params: [90] }])
    .contrast(0.1)
    .invert()
    .write("processed.jpg")
    .getBufferAsync(Jimp.MIME_PNG);

  const { data } = await worker.recognize(imageBuffer);

  const bounds = data.blocks
    ?.filter(({ text }) => text.trim() === searchText)
    .map(({ bbox }) => ({
      x: bbox.x0,
      y: bbox.y0,
      width: bbox.x1 - bbox.x0,
      height: bbox.y1 - bbox.y0,
    }))
    .at(0);

  await worker.terminate();

  return bounds;
};

(async () => {
  await main();
})();

Original response

You will need to crop the text out of the image.

  • Position: (700, 160)
  • Dimensions: 150×40

The image is too noisy, even if you convert it to greyscale.

Also, you can set tessedit_pageseg_mode to PSM.SINGLE_LINE.

import path from "path";
import Jimp from "jimp";
import { createWorker, PSM } from "tesseract.js";

const __dirname = path.resolve();

const main = async () => {
  const position = await getPosition(
    path.join(__dirname, "image.png"),
    700,
    160,
    150,
    40
  );

  console.log(position); // { x: 323, y: 528 }
};

const getPosition = async (imagePath, xOffset, yOffset, width, height) => {
  const worker = await createWorker({
    logger: (m) => {
      // console.log(m);
    },
  });

  await worker.loadLanguage("eng");
  await worker.initialize("eng");
  await worker.setParameters({
    tessedit_char_whitelist: "XY012345678:",
    tessedit_pageseg_mode: PSM.SINGLE_LINE,
  });

  const image = await Jimp.read(imagePath);
  const convertedImage = image
    .grayscale()
    .contrast(0.3)
    .crop(
      xOffset ?? 0,
      yOffset ?? 0,
      width ?? image.bitmap.width,
      height ?? image.bitmap.height
    )
    .write("greyscale.jpg");
  const base64 = await convertedImage.getBase64Async(Jimp.AUTO);

  const {
    data: { text },
  } = await worker.recognize(base64);

  let [x, y] = text
    .match(/X:(\d+)Y:(\d+)/)
    ?.slice(1)
    ?.map((v) => parseInt(v, 10)) || [-1, -1];

  await worker.terminate();

  return { x, y };
};

(async () => {
  await main();
})();
Mr. Polywhirl
  • 42,981
  • 12
  • 84
  • 132