Questions tagged [docsplit]

Docsplit is a command-line utility and Ruby library that converts documents into PDFs and breaks them apart into images and text.

Docsplit is a command-line utility and Ruby library that converts documents into PDFs and breaks them apart into images and text.

For more information see the library documentation page and the github project repository

21 questions
22
votes
4 answers

An efficient way to convert document to pdf format

I have been trying to find the efficient way to convert document e.g. doc, docx, ppt, pptx to pdf. So far i have tried docsplit and oowriter, but both took > 10 seconds to complete the job on pptx file having size 1.7MB. Can any one suggest me a…
Aamir Rind
  • 38,793
  • 23
  • 126
  • 164
7
votes
2 answers

Docsplit Ruby on Rails

I'm trying to get docsplit to work with my rails app. Right now I'm just trying to get it to run locally. I installed the gem and all of the dependencies. All of the basic examples work in the command line and I was able to get the…
Bcos
  • 135
  • 3
  • 8
6
votes
1 answer

Getting remove_entry_secure error while using ruby application

I am trying to split PDF files into images using docsplit. But it appears I have issues with my ruby installation. I keep getting the following error every time: /usr/lib/ruby/1.8/fileutils.rb:694:in `remove_entry_secure': parent directory is world…
Frankline
  • 40,277
  • 8
  • 44
  • 75
6
votes
1 answer

How to Upload a multipage PDF and convert it to JPEG with Paperclip?

Does anyone know how to upload a multi-page pdf with Paperclip and convert each page into a Jpeg? So far, every time I upload a PDF, it only allows me to see the first page of the PDF as a JPEG. But I would like to be able to upload and convert…
Serge Pedroza
  • 2,160
  • 3
  • 28
  • 41
2
votes
2 answers

Extract text from document in memory using docsplit

With the docsplit gem I can extract the text from a PDF or any other file type. For example, with the line: Docsplit.extract_pages('doc.pdf') I can have the text content of a PDF file. I'm currently using Rails, and the PDF is sent through a…
fotanus
  • 19,618
  • 13
  • 77
  • 111
1
vote
0 answers

What would cause RSpec to print out wrong array size until after to_yaml?

RSpec seems to not output the right size of an array, almost like it does not process everything until it is asked to output something. What might be causing this? Here is a portion of the spec code: puts…
Justin Giboney
  • 3,271
  • 2
  • 19
  • 18
1
vote
0 answers

How to Upload a multipage PDF and convert each page to a JPEG with Paperclip?

Does anyone know how to upload a multi-page pdf with Paperclip and convert each page into a Jpeg? So far, every time I upload a PDF, it only allows me to see the first page of the PDF as a JPEG. But I would like to be able to upload and convert…
Serge Pedroza
  • 2,160
  • 3
  • 28
  • 41
1
vote
0 answers

No such file or directory @ rb_sysopen (Errno::ENOENT) - DOCSPLIT

I'm trying to extract images from a Microsoft Office Word document with Docsplit and returns this error: /home/deploy/.rvm/gems/ruby-2.1.2/gems/docsplit-0.7.5/lib/docsplit/transparent_pdfs.rb:22:in `initialize': No such file or directory @…
Kerm1t
  • 31
  • 1
  • 9
1
vote
2 answers

Plone and document-viewer

I'm working on Plone. I've successfully installed document-viewer: Now I have a very nice preview of uploaded pdfs. The problem resides on world and excel files previews. As requirements says, I have to install OpenOffice or LibreOffice in order to…
Massimo Variolo
  • 4,669
  • 6
  • 38
  • 64
1
vote
1 answer

Convert PPT into Images in Rails

I am using docsplit gem to convert PPT into images Docsplit.extract_images(uploaded.path.to_s, :size => '550x', :format => [:jpg],:output=>"#{Rails.root}/public/images") it convert successfully but i will take more time . Can i Convert any other…
Ravendra Kumar
  • 1,072
  • 10
  • 29
1
vote
0 answers

parse checkboxes from a pdf in rails

I need to parse checkboxes from a PDF. I'm using docksplit gem of ruby.The problem is that when a pdf is parsed only its text gets parsed and there is no sign of check-box existence. Here is a screenshot of my PDF checkbox( box marked with cross(X)…
Sachin Prasad
  • 5,365
  • 12
  • 54
  • 101
1
vote
2 answers

Unable to convert openoffice documents using docsplit resulting in java.lang.NoClassDefFoundError

I have installed the docsplit gem and been able to convert PDF documents. However when it comes to splitting openoffice documents such as powerpoint and word files, I get the following error: Exception: Command /usr/local/bin/docsplit pdf…
Frankline
  • 40,277
  • 8
  • 44
  • 75
0
votes
1 answer

How to get a proper filepath of Tempfile in Rails when seeing "Getting Errno::ENOENT ' no such file or directory @ rb_sysopen'"

I want to show a preview of presentation files on my website. I am trying to make a tempfile which reads from a Microsoft PowerPoint Open XML (.pptx) file stored in active storage. I am using Docsplit.extract_images on the tempfile to convert the…
0
votes
1 answer

Counting PDF pages in ROR with Docsplit

I need to get the count of pages in PDF files stored in Ruby on Rails 5.2.3 ActiveStorage using Docsplit. I'm uploading PDF documents using Ruby on Rails ActiveStorage. I understand these documents are stored as a blob. I was hoping I could pass the…
Clint Laskowski
  • 149
  • 1
  • 3
  • 13
0
votes
0 answers

NoMethodError in CollectionsController#create undefined method `file' for nil:NilClass

I'm trying to upload a pdf file to dropbox through rails and convert the pages in the pdf into jpeg images and store those images in the same dropbox folder. I tried using Docsplit gem but i dont know how to address the file inthe drop box in…
1
2