0

we are developing a website that needs to convert PDF files into HTML because some of the PDF has a form (not necessarily fillable PDF, these PDFs are printed to be filled up).

So we want it to be filled up through our website instead of printing the files and filled up by pen. We are going paperless.

DocuSign provides these wherein you can upload PDF, then you can customized it to have textboxes, checkbox. So we're kinda using DocuSign as a reference but still haven't figured out how they did it (Almost perfect convertion of PDF to HTML vice-versa).

So far I've tried several 3rd party softwares for converting PDF to HTML. I've tried XPDF, Poppler, & ImageMagick.

ImageMagick converts a PDF to an image which is not suitable as these images has a large size when converted back to a PDF for printing.

Poppler is a fork XPDF based on my research, I've tried it after using XPDF to see if it's better, it basically does what XPDF do but it converts the PDF to have bigger pixels on the CSS when converted to HTML. That's fine but it loses the font family.

XPDF converts PDF to HTML but the pixel is smaller, so when I convert it back to PDF, it does not fit the whole page, and I still have to manually adjust all the CSS to fit it.

So after using these 3rd party softwares, I convert back the HTML files into PDF using MPDF, and the converted files has so much inconsistencies. Texts are not aligned properly. It's basically not the same as the original PDF.

Any help will be appreciated thanks!

fsnight
  • 17
  • 1
  • 7

1 Answers1

0

What you are trying to do is not as straight forward it may seem. I have worked with Adobe Sign, formerly known as EchoSign, for years and I have a pretty good idea on how these services work. With that been said I strongly suggest looking into one of these eSign services instead of trying to roll out your own. It will save you a lot of time.

This is how it all works

  1. The PDF must have a form itself with named fields. In other words, if you open such PDF in Adobe Reader or Chrome you should be able to fill in the fields. If your PDF does not have a PDF form you will need additional software like Acrobat PRO to create the form.
  2. You must convert the PDF into a flat image that can be rendered in the browser.
  3. You will need a tool to extract the PDF Form information, such as the field names, types, dimensions, and coordinates.
  4. With all this information you can then render the PDF image(s) in the browser. Place absolute positioned HTML form elements over the image using the field type, dimensions, and coordinates from the previous step. Each HTML element needs to reference a PDF form field by name.
  5. Once you have collected the information and a data map like field_name => field_value from your HTML widget, you will need to use additional software to programmatically fill in the PDF form in the original PDF. A PDF form information is often stored in FDF or XFDF file.

I don't know of a single tool that will help you with the things outlined above, at least not in PHP. However, I can provide you with a suggestion can be helpful:

  • PDFtk Server - Can help you to both, extract the PDF form fields information and fill in the same an XFDF file. Unforutently, the form field information that you can extract with such tool does not include dimensions and coordinates.
  • iText - A library available in .Net and Java that can be used to extract detailed information about the PDF form including the dimension and coordinates of the fields. You can create microservice using this toolkit that can communicate with PHP.

There are definitely a lot more tools out there for the job. Hopefully, this information will guide you in the right direction or help you make a decision on how to move forward with your project.

Pablo
  • 5,897
  • 7
  • 34
  • 51
  • That's exactly what we're doing right now, positioning absolute html form elements, but instead on the image converted from PDF, the elements were automatically created once the PDF was converted using XPDF. The first option that we did is the html form elements over the image, but after going through different PDFs, we encountered wrong conversion of PDF to image, the image goes all black thats why I converted to XPDF instead of ImageMagick. – fsnight Sep 13 '19 at 02:29
  • So I guess we'll stick to converting the PDF into image then overlaying the html form elements, get their values and put it to the actual form inside the PDF. My last question is, is there a library for PHP that can make a PDF fillable? Thanks – fsnight Sep 13 '19 at 02:37
  • The task to add an Acrobat Form or make the PDF fillable will most likely need to be a manual step. Doing this manually is not too bad since you only need do it once per PDF application. You will probably need Acrobat PRO to do so. If you gona look into PDFtk to extract the PDF form information avoid using `Adobe LiveCycle` to edit the PDF :) – Pablo Sep 13 '19 at 02:50
  • Thanks for the fast response, I went back to our first option. Converting the PDFs to image by using ImageMagick, seems like the problem we encountered was produced when converting to JPG files, we tried PNG and it looks great. So now we'll just put over the html elements on the image and get their data once submitted to fill up the PDF form. I really appreciate your help. – fsnight Sep 13 '19 at 03:10
  • @fsnight, could you please provide more details on how you solved this problem with ImageMagick? Specifically, what you did after you converted your PDFs to an image? – Dave Apr 13 '20 at 19:42
  • 1
    @Dave ... I have been working on a project based on an article I read here. https://www.codeproject.com/Articles/466362/Blend-PDF-with-HTML5 Demo: http://www.hanray.com/sites/BlendPDFWithHTML5/#/pdf/f1040ezt He used PDF.js. If you use a form-fillable PDF it will render an image of the pdf, then overlay html form fields on top. I currently have it so it saves the data as it is entered. Can just be submitted as a standard form. Working on serializing and storing in table then reading the data back, inserting the data into the PDF, then flattening it. – WGS May 14 '20 at 12:48