2

While I am scanning for text using vision API, Overlay return multiple text boxes as unsorted list. So when I read for text by looping them, sometimes I am getting texts in wrong order, ie., text from bottom of the page appears first.

Sample code of receiveDetections in OcrDetectorProcessor.java

@Override
public void receiveDetections(Detector.Detections<TextBlock> detections) {
    mGraphicOverlay.clear();
    SparseArray<TextBlock> items = detections.getDetectedItems();
    for (int i = 0; i < items.size(); ++i) {
        TextBlock item = items.valueAt(i);
        OcrGraphic graphic = new OcrGraphic(mGraphicOverlay, item);
        mGraphicOverlay.add(graphic);
    }
}

In this code, I want to sort mGraphicOverlay list based on TextBlock's position.

If any solution/suggestion available, then it will be very helpful for me.

Gunaseelan
  • 14,415
  • 11
  • 80
  • 128
  • Is the answer of Rajesh helping you ? have you found a solution ? If not let us know. – Arnauld Alex Apr 03 '18 at 14:31
  • 1
    @ArnauldAlex I didn't test Rajesh's answer. I have created my own comparator to sort text blocks instead of text lines. I have posted the answer for your reference. – Gunaseelan Apr 04 '18 at 04:35
  • Sorting textBlocks is not enough for better accuracy you need to break it down into lines. – rajesh Apr 04 '18 at 11:26
  • @rajesh I have updated my answer. – Gunaseelan Apr 06 '18 at 05:31
  • @Gunaseelan Just to know, concerning the camera source "setRequestedPreviewSize". What are you using and how did you choose, because i'm trying so many resolution and none is OK – Arnauld Alex Apr 18 '18 at 07:57
  • @ArnauldAlex I have written a blog with sample project, please take look on https://v4all123.blogspot.in/2018/03/simple-example-of-ocrreader-in-android.html – Gunaseelan Apr 19 '18 at 11:49

3 Answers3

4

You need to sort output as per shown in the sample code of OCR. I am breaking text block into lines before sorting.

Here is my code:

List<Text> textLines = new ArrayList<>();

    for (int i = 0; i < origTextBlocks.size(); i++) {
        TextBlock textBlock = origTextBlocks.valueAt(i);

        List<? extends Text> textComponents = textBlock.getComponents();
        for (Text currentText : textComponents) {
            textLines.add(currentText);
        }
    }


    Collections.sort(textLines, new Comparator<Text>() {
        @Override
        public int compare(Text t1, Text t2) {
            int diffOfTops = t1.getBoundingBox().top -  t2.getBoundingBox().top;
            int diffOfLefts = t1.getBoundingBox().left - t2.getBoundingBox().left;     

            if (diffOfTops != 0) {
                return diffOfTops;
            }
            return diffOfLefts;
        }
    });

    StringBuilder textBuilder = new StringBuilder();
    for (Text text : textLines) {
        if (text != null && text.getValue() != null) {
            textBuilder.append(text.getValue() + "\n");
        }
    }

String ocrString = textBuilder.toString();

rajesh
  • 199
  • 1
  • 11
3

I created textblock comparator like this.

public static Comparator<TextBlock> TextBlockComparator
        = new Comparator<TextBlock>() {
    public int compare(TextBlock textBlock1, TextBlock textBlock2) {
        return textBlock1.getBoundingBox().top - textBlock2.getBoundingBox().top;
    }
};

And sorted using Arrays.sort(myTextBlocks, Utils.TextBlockComparator);

Update

Today I had a time to test @rajesh's Answer. It seems textblock sorting is more accurate than text line sorting.

I tried to extract text from following image. enter image description here

Result by TextBlockComparator enter image description here

Result by TextLineComparator enter image description here

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
Gunaseelan
  • 14,415
  • 11
  • 80
  • 128
0

Well, If you have time, test my code. It's done carefully and has been tested a lot of time. It is design to take a sparseArray (like the api give) and return the same but sorted. Hope it helps you.

/**
 * Taking all the textblock in the frame, sort them to be at the same
 * location as it is in real life (not as the original output)
 * it return the sparsearray with the same textblock but sorted
 */
private SparseArray<TextBlock> sortTB(SparseArray<TextBlock> items) {
    if (items == null) {
        return null;
    }

    int size = items.size();
    if (size == 0) {
        return null;
    }

    //SparseArray to store the result, the same that the one in parameters but sorted
    SparseArray<TextBlock> sortedSparseArray = new SparseArray<>(size);

    //Moving from SparseArray to List, to use Lambda expression
    List<TextBlock> listTest = new ArrayList<>();
    for (int i = 0; i < size; i++) {
        listTest.add(items.valueAt(i));
    }

    //sorting via a stream and lambda expression, then collecting the result
    listTest = listTest.stream().sorted((textBlock1, textBlock2) -> {
        RectF rect1 = new RectF(textBlock1.getComponents().get(0).getBoundingBox());
        RectF rect2 = new RectF(textBlock2.getComponents().get(0).getBoundingBox());

        //Test if textBlock are on the same line
        if (rect2.centerY() < rect1.centerY() + SAME_LINE_DELTA
                && rect2.centerY() > rect1.centerY() - SAME_LINE_DELTA) {
            //sort on the same line (X value)
            return Float.compare(rect1.left, rect2.left);
        }
        //else sort them by their Y value
        return Float.compare(rect1.centerY(), rect2.centerY());
    }).collect(Collectors.toList());

    //Store the result to the empty sparseArray
    for (int i = 0; i < listTest.size(); i++) {
        sortedSparseArray.append(i, listTest.get(i));
    }

    //return the sorted result
    return sortedSparseArray;
}
Arnauld Alex
  • 339
  • 1
  • 3
  • 13