I have implemented Google's Mobile Vision for Android by following a tutorial. I am trying to build an app that will scan a receipt and find the numeric total. However, as I scan different receipts that are printed in different formats, the API will detect TextBlocks in what seems to be an arbitrary way. For example, in one receipt, if several words of text are separated by single spaces, then they are grouped into a single TextBlock. However, if two words of text are separated by lots of spaces, then they are separated as independent TextBlocks, even though they appear on the same "line". What I am trying to do is force the API to recognize each entire line of the receipt as a single entity. Is this possible?
Asked
Active
Viewed 2,085 times
5
-
Did you find a solution for this yet? If so, were you able to detect on existing images as opposed to using a camera app real-time? – DaveNOTDavid Mar 05 '18 at 16:19
-
Did you find a solution yet? – Biswas Khayargoli Mar 14 '22 at 13:39
1 Answers
0
public ArrayList<T> getAllGraphicsInRow(float rawY) {
synchronized (mLock) {
ArrayList<T> row = new ArrayList<>();
// Get the position of this View so the raw location can be offset relative to the view.
int[] location = new int[2];
this.getLocationOnScreen(location);
for (T graphic : mGraphics) {
float rawX = this.getWidth();
for (int i=0; i<rawX; i+=10){
if (graphic.contains(i - location[0], rawY - location[1])) {
if(!row.contains(graphic)) {
row.add(graphic);
}
}
}
}
return row;
}
}
This should be in the GraphicOverlay.java file and essentially fetches all the graphics in that row.
public static boolean almostEqual(double a, double b, double eps){
return Math.abs(a-b)<(eps);
}
public static boolean pointAlmostEqual(Point a, Point b){
return almostEqual(a.y,b.y,10);
}
public static boolean cornerPointAlmostEqual(Point[] rect1, Point[] rect2){
boolean almostEqual=true;
for (int i=0; i<rect1.length;i++){
if (!pointAlmostEqual(rect1[i],rect2[i])){
almostEqual=false;
}
}
return almostEqual;
}
private boolean onTap(float rawX, float rawY) {
String priceRegex = "(\\d+[,.]\\d\\d)";
ArrayList<OcrGraphic> graphics = mGraphicOverlay.getAllGraphicsInRow(rawY);
OcrGraphic currentGraphics = mGraphicOverlay.getGraphicAtLocation(rawX,rawY);
if (graphics !=null && currentGraphics!=null) {
List<? extends Text> currentComponents = currentGraphics.getTextBlock().getComponents();
final Pattern pattern = Pattern.compile(priceRegex);
final Pattern pattern1 = Pattern.compile(priceRegex);
TextBlock text = null;
Log.i("text results", "This many in the row: " + Integer.toString(graphics.size()));
ArrayList<Text> combinedComponents = new ArrayList<>();
for (OcrGraphic graphic : graphics) {
if (!graphic.equals(currentGraphics)) {
text = graphic.getTextBlock();
Log.i("text results", text.getValue());
combinedComponents.addAll(text.getComponents());
}
}
for (Text currentText : currentComponents) { // goes through components in the row
final Matcher matcher = pattern.matcher(currentText.getValue()); // looks for
Point[] currentPoint = currentText.getCornerPoints();
for (Text otherCurrentText : combinedComponents) {//Looks for other components that are in the same row
final Matcher otherMatcher = pattern1.matcher(otherCurrentText.getValue()); // looks for
Point[] innerCurrentPoint = otherCurrentText.getCornerPoints();
if (cornerPointAlmostEqual(currentPoint, innerCurrentPoint)) {
if (matcher.find()) { // if you click on the price
Log.i("oh yes", "Item: " + otherCurrentText.getValue());
Log.i("oh yes", "Value: " + matcher.group(1));
itemList.add(otherCurrentText.getValue());
priceList.add(Float.valueOf(matcher.group(1)));
}
if (otherMatcher.find()) { // if you click on the item
Log.i("oh yes", "Item: " + currentText.getValue());
Log.i("oh yes", "Value: " + otherMatcher.group(1));
itemList.add(currentText.getValue());
priceList.add(Float.valueOf(otherMatcher.group(1)));
}
Toast toast = Toast.makeText(this, " Text Captured!" , Toast.LENGTH_SHORT);
toast.show();
}
}
}
return true;
}
return false;
}
This should be in OcrCaptureActivity.java and it breaks up the TextBlock into lines and finds the blocks in the same row as the line and checks if the components are all prices, and prints all value accordingly.
The eps value in almostEqual is the tolerance for how tall it checks for graphics in the row.

bhuang
- 31
- 3
-
I presume this only works while using a camera app real-time as opposed to existing images since you'll need to use Text Recognition API's classes, CameraSourcePreview and GraphicOverlay, correct? – DaveNOTDavid Mar 05 '18 at 16:18