import cv2
import pytesseract
img = cv2.imread(filename)
boxes = pytesseract.image_to_boxes(img).split("\n")
This gives me the bounding boxes for each character like so 'r 134 855 148 871 0` and also does not include the space character. I need the bounding boxes for each line, where a line is a group character who's bounding box intersects the same horizontal line.
So I require something like:
boxes = image_to_line_boxes(img)
where boxes is a list like something [("Hello, this is the first line!", "134 855 148 871 0"), ("This is the second line", "264 816 288 832 0")]