I'm following this tutorial: https://cloud.ibm.com/docs/services/visual-recognition?topic=visual-recognition-tutorial-recognize-text&locale=en-US#pr-ximos-passos
My goal is read a document and made a table of content. The content is of type KEY - VALUE, like "VALUE 10.00". I can extract text of image but I can't extract the numbers.
- Contextualizing the problem:
I'm using this image
Values that must extracted:
DATA 13/06/2016
AGENCIA/CASH 0180/2009
VALOR DEPOSITO EM DINHEIRO 50.00
But when I using the follow curl call to Visual Recognition service:
curl -u "apikey:{API_KEY}" --form "images_file=@teste1.png" "https://gateway.watsonplatform.net/visual-recognition/api/v3/recognize_text?version=2018-03-19" -k
Result (a piece):
"text": "data gigolo hora\nman/em 251\nnumero envelope 689 574\nvalor depusitd eh 4\ncpf no defusnantez 614 220\ndata lananzmnz",
"words": [
{
"word": "data",
"location": {
"height": 18,
"width": 40,
"left": 13,
"top": 10
},
"score": 0.6098,
"line_number": 0
},
{
"word": "gigolo",
"location": {
"height": 43,
"width": 57,
"left": 146,
"top": 0
},
"score": 0.4283,
"line_number": 0
},
{
"word": "hora",
"location": {
"height": 18,
"width": 39,
"left": 249,
"top": 11
},
"score": 0.6533,
"line_number": 0
},
{
"word": "man/em",
"location": {
"height": 17,
"width": 72,
"left": 127,
"top": 35
},
"score": 0.8187,
"line_number": 1
},
{
"word": "251",
"location": {
"height": 21,
"width": 30,
"left": 294,
"top": 33
},
"score": 0.9881,
"line_number": 1
},
{
"word": "numero",
"location": {
"height": 21,
"width": 54,
"left": 12,
"top": 52
},
"score": 0.9116,
"line_number": 2
},
Note, that some words is good extracted, but the numbers not, my main goal is extract monetary values and dates.
To create my table I can use the "height"
property to know which is your respective numeric value.
So, how I extract the numbers?
PS.: This is a Portuguese(BR) document.