3

Does anybody have a clue how to use Google Apps Script (GAS) to overcome the max file size limit preventing the .pdfToText() function @Mogsdad introduced here from converting a PDF of, say, more that 10 pages at a time?

The limitation is inherent to the Drive.Files.insert() method documented here. This method occurs in line 58 of the pdfToText.js file located here.

Community
  • 1
  • 1
Let Me Tink About It
  • 15,156
  • 21
  • 98
  • 207
  • Waitasec... the file size limit for insert() is 5 Terabytes - no 10 page PDF comes close. Is the limit somewhere else? – Mogsdad Dec 04 '14 at 21:38
  • Ok, yes - since the conversion is being stored as a Google Doc, the limit that applies is documented [here](https://support.google.com/drive/answer/37603), basically 1M characters / 50MB. Still surprising that a 10pg doc would hit it. That step (line 58 converts PDF to a DOC on Drive) is where the real magic occurs, so it can't be skipped. – Mogsdad Dec 04 '14 at 21:44
  • Maybe the web API from [here](https://cloudconvert.com/pdf-to-txt) could be scripted? – Mogsdad Dec 04 '14 at 21:47
  • Check out [this tool](https://www.idolondemand.com/developer/apis/extracttext#overview)... I was having the same problem today at 10 pages (even commented on @Mogsdad's git about it before finding this post)... The API above solved my problem and is easily called by URLFetchApp in Google Apps Script. – mulliweht Sep 04 '15 at 21:38

1 Answers1

2

Summary from comments:

Per @mulliweht

Check out this tool... I was having the same problem today at 10 pages (even commented on @Mogsdad's git about it before finding this post)... The API above solved my problem and is easily called by URLFetchApp in Google Apps Script.

Let Me Tink About It
  • 15,156
  • 21
  • 98
  • 207