Well basically I have the same problems as discussed here: http://blog.joshsoftware.com/2014/08/13/pdf-to-plain-text-processing-using-docsplit/ But the solution that they propose in docsplit doesn't work.
Docsplit.extract_text(filepath, {:pdf_opts => ‘-layout’, output: ‘tmp_text_file’})
the :pdf_opts => ‘-layout’ option doesn't do anything and I can't find any documentation about options like that, thus I get a single word per line in the output text file.
Does anyone know how to get an accurate text file ?
Thank you