How to slice an image by table border

Question

I have many png files like this:

I want to slice the image into 48 (=6x8) small image files for the 48 cells separated by the table borders. That is, I would like to have files img11.png, ..., img68.png, where img11.png contains the (1,1) "1.4x4x8" cell, img12.png the (1,2) "M/T" cell, img13.png the "550,000" cell, ..., img68.png the bottom right "641,500" cell.

I want to do it because I thought it would improve the performance of tesseract, which is not satisfactory because many of my image files have much poorer quality than shown above. Also, margins and sizes are diverse, and some images contain non-English characters and images.

Would there be software packages to detect the table borders and slice the image into m x n images? I am new in this area. I have read How to find table like structure in image but it's way beyond my ability. I am willing to learn, though.

Thanks for your help.

sum the rows, then sum the columns separately, the black lines should sum to zero, then detect the zero indices in the rows/columns summation, then you can split your image by the indices you have [x0, x1, ...], [y0, y1, ...] — Bilal, Jun 19 '21 at 20:06

chan1142 · Accepted Answer · 2021-06-20T23:55:42.540

I'm using R. Bilal's suggestion (thanks) led me to the following.

Step 1: Convert the image to grayscale.

library(magick)
x <- image_read('https://i.stack.imgur.com/plBvs.png')
y <- image_convert(x, colorspace='Gray')
a <- as.integer(y[[1]])[,,1]

Step 2: Convert "dark" to 1 and "light" to 0.

w <- ifelse(a>190, 0, 1)         # adjust 190

Step 3: Detect the horizontal and vertical lines.

ypos <- which(rowMeans(w) > .95)  # adjust .95
xpos <- which(colMeans(w) > .95)  # adjust .95

Step 4: Crop the original image (x).

xpos <- c(0,xpos, ncol(a))
ypos <- c(0,ypos, nrow(a))

outdir <- "cropped"
dir.create(outdir)
m <- 0
for (i in 1:(length(ypos)-1)) {
  dy <- ypos[i+1]-ypos[i]
  n <- 0
  if (dy < 16) next  # skip if too short
  m <- m+1
  for (j in 1:(length(xpos)-1)) {
    dx <- xpos[j+1]-xpos[j]
    if (dx < 16) next  # skip if too narrow
    n <- n+1
    geom <- sprintf("%dx%d+%d+%d", dx, dy, xpos[j], ypos[i])
    # cat(sprintf('%2d %2d: %s\n', m, n, geom))
    cropped <- image_crop(x, geom)
    outfile <- file.path(outdir, sprintf('%02d_%02d.png', m, n))
    image_write(cropped, outfile, format="png")
  }
}

The cropped (1,1) image is .

How to slice an image by table border

1 Answers1