I am trying to convert the full-text annotations of google vision OCR result to line level and word level which is in Block
,Paragraph
,Word
and Symbol
hierarchy.
However, when converting symbols
to word
text and word
to line
text, I need to understand the DetectedBreak property.
I went through This documentation.But I did not understand few of the them.
Can somebody explain what do the following Breaks mean? I only understood LINE_BREAK
and SPACE
.
- EOL_SURE_SPACE
- HYPHEN
- LINE_BREAK
- SPACE
- SURE_SPACE
- UNKNOWN
Can they be replaced by either a newline char or space ?