4

I have some code which uses the yob/pdf-reader gem to fetch the text from a PDF file and search for certain strings in it. Recently I got the issue that vertical text may interfere with my logic to identify text that belongs to the same line.

My question is: Does someone know a way to distinguish text orientation with pdf-reader?

Jonas
  • 515
  • 4
  • 13
  • I recently use pdf reader, but all I did is to use Regex to play with text. Can you share a bit of code here, or a GitHub link ? – Arup Rakshit Apr 02 '14 at 09:22
  • What is your goal? I use pdf-reader in a similar way to parse e.g. receiver information from letters. For this purpose I wrote custom classes for some internal things like PageTextReceiver which expose the parsed text runs. As I am only interested in text in certain areas I wrote analyzing logic which filters and examines exactly that. – Jonas Apr 06 '14 at 01:52
  • @jonsa It gives me lots of pain. So if I could see, your code, may be helpful for me... :) – Arup Rakshit Apr 06 '14 at 06:00
  • @ArupRakshit I summarized my findings and experiments in a blog post: http://blog.peschla.net/2014/04/parsing-pdf-text-with-coordinates-in-ruby/ – Jonas Apr 08 '14 at 19:57
  • Awesome... I liked it. If I have problem with *pdf-reader*, I will contact you.. Agreed ? :-) – Arup Rakshit Apr 08 '14 at 19:59
  • Sure, if I can help. And if you find out how to identify vertical text, remember my question here ;) – Jonas Apr 08 '14 at 20:16
  • @jonas.. sure.. I am very new to pdf-reader.. I am still reading and palying with the gem. But your blog will help me a lot.. – Arup Rakshit Apr 08 '14 at 20:17

0 Answers0