search a specific string in a pdf document stored on Google Drive

Question

My project is about sending an email with a pdf attachment. All my pdf are in a folder on google drive and I need to look for that specific pdf associated to that specific customer. The pdf contains text only and contains the customer number.

Thus i need a script to extract the text from the pdf to a string and study this string to find out if it contains the customer number.

For now I use this:

function myFunction() {
  // Creates a new file and logs its content
  var file = DocsList.getFileById('my pdf file id here')
  Logger.log(file.getContentAsString()); // logs 'sample file contents'
}

But the log shows an encoding issue:

m��:�B�C-�BݣXaP�{�� u�hu@��(�="��j�=��%C��g(r{��j��/��=��Ev��3�=��P��>��̓�e(r{��yX�Pd�PޗEv�j�@�ݣ2�Eq��b��h�="�(�{�,v��GE�O�_��q�o�v�)��p��u�\9�[�G��

Does someone knows how to extract text from a pdf to a string?

Have a look at my answer on Web Application: http://webapps.stackexchange.com/a/61069/29140 — Jacob Jan Tuinstra, May 30 '14 at 06:24

score 0 · Answer 1 · edited May 23 '17 at 12:05

The pdfToText() utility from Get pdf-attachments from Gmail as text uses the advanced Drive service and DocumentApp to convert PDF to Google-Doc to text. You can get the OCR'd text this way, or save it directly to a txt file in any folder on your Drive.

// Start with a Blob object
var blob = DriveApp.getFilesByName("my.pdf")[0];

// filetext will contain text from pdf file, no residual files are saved:
var filetext = pdfToText( blob, {keepTextfile: false} );

Once you have the text, a search for keywords becomes dead easy!

if (filetext.indexOf( keyword ) !== -1) {
  // Found keyword...
}

search a specific string in a pdf document stored on Google Drive

1 Answers1