Questions tagged [layout-extraction]

3 questions
4
votes
2 answers

Is OCR no longer an issue?

According to Wikipedia, "The accurate recognition of Latin-script, typewritten text is now considered largely a solved problem on applications where clear imaging is available such as scanning of printed documents." However, it gives no citation. My…
David Johnstone
  • 24,300
  • 14
  • 68
  • 71
0
votes
2 answers

Extracting html elements in a given region?

Given a region defined by a rectangle and a url, is there any way to determine what elements lie within the given rectangle on the page at the given url? EDIT: Screen resolution, Font size, etc.. can all be set to reasonable defaults.
Paul Wicks
  • 62,960
  • 55
  • 119
  • 146
0
votes
4 answers

optical character recognition of PDFs of parliamentary debates

For a contract work, I need to digitalize a lot of old, scanned-graphic-only plenary debate protocol PDFs from the Federal Parliament of Germany. The problem is that most of these files have a two-column format: Sample Protocol…
Cetin Sert
  • 4,497
  • 5
  • 38
  • 76