0

My project is about sending an email with a pdf attachment. All my pdf are in a folder on google drive and I need to look for that specific pdf associated to that specific customer. The pdf contains text only and contains the customer number.

Thus i need a script to extract the text from the pdf to a string and study this string to find out if it contains the customer number.

For now I use this:

function myFunction() {
  // Creates a new file and logs its content
  var file = DocsList.getFileById('my pdf file id here')
  Logger.log(file.getContentAsString()); // logs 'sample file contents'
}

But the log shows an encoding issue:

m��:�B�C-�BݣXaP�{�� u�hu@���(�="���j�=��%C���g(r{����j��/��=��Ev���3�=���P���>��̓�e(r{��yX�Pd�PޗEv�j�@�ݣ2�Eq��b����h�="�(�{�,v���GE�O�_����������q�o�v�)��p���u�\9�[�G��

Does someone knows how to extract text from a pdf to a string?

Jacob Jan Tuinstra
  • 1,197
  • 3
  • 19
  • 50
user68137
  • 1
  • 3

1 Answers1

0

The pdfToText() utility from Get pdf-attachments from Gmail as text uses the advanced Drive service and DocumentApp to convert PDF to Google-Doc to text. You can get the OCR'd text this way, or save it directly to a txt file in any folder on your Drive.

// Start with a Blob object
var blob = DriveApp.getFilesByName("my.pdf")[0];

// filetext will contain text from pdf file, no residual files are saved:
var filetext = pdfToText( blob, {keepTextfile: false} );

Once you have the text, a search for keywords becomes dead easy!

if (filetext.indexOf( keyword ) !== -1) {
  // Found keyword...
}
Community
  • 1
  • 1
Mogsdad
  • 44,709
  • 21
  • 151
  • 275