Questions tagged [layout-extraction]
3 questions
4
votes
2 answers
Is OCR no longer an issue?
According to Wikipedia, "The accurate recognition of Latin-script, typewritten text is now considered largely a solved problem on applications where clear imaging is available such as scanning of printed documents." However, it gives no citation.
My…

David Johnstone
- 24,300
- 14
- 68
- 71
0
votes
2 answers
Extracting html elements in a given region?
Given a region defined by a rectangle and a url, is there any way to determine what elements lie within the given rectangle on the page at the given url?
EDIT: Screen resolution, Font size, etc.. can all be set to reasonable defaults.

Paul Wicks
- 62,960
- 55
- 119
- 146
0
votes
4 answers
optical character recognition of PDFs of parliamentary debates
For a contract work, I need to digitalize a lot of old, scanned-graphic-only plenary debate protocol PDFs from the Federal Parliament of Germany.
The problem is that most of these files have a two-column format:
Sample Protocol…

Cetin Sert
- 4,497
- 5
- 38
- 76