0

I am reading all the pdf files present in my system and writing it to text file "output.txt" from command line utility "pdftotext", but while reading files which are not properly structured (like pdf files of images and many others),it throws some errors like

/home/vikrantsingh/Downloads/ARRAYS_NEW.pdf
/home/vikrantsingh/Downloads/GPOS_casestudy_solution_v2.pdf
/home/vikrantsingh/Downloads/Tutorial.pdf
/home/vikrantsingh/Downloads/The_C_Programming_Language.pdf
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (27972): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (41087): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (51900): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (62716): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (65450): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (68463): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'

What i want is when it encounters first error, just move to next file instead of reading the same file.I am using Python 2.7.The my code is like

    import os
    import sys
    import re
    import subprocess
    root = '/home'
    targetpath = ""
    path = os.path.join(root, targetpath)
    filepath = []
    count = 0
    filesize = 0
    for r,subdir,f in os.walk(path):
        ultimate_path = os.path.join(path,r)
        for file in f:
             if file.find(".pdf")!=-1:
             print os.path.join(ultimate_path,file)
             filesize = os.path.getsize(os.path.join(ultimate_path,file))+filesize
             subprocess.call(['pdftotext', os.path.join(ultimate_path,file), 'output.txt'])
        #print file

        count = count+1
        print count
        print filesize/(1048576.0)

This is sample code for reading pdf files from "pdftotext". I want to catch error so that i move on for reading next pdf.

I have seen one post regarding this . Thanks

Community
  • 1
  • 1
vaibhav1312
  • 863
  • 4
  • 13
  • 31

1 Answers1

1

These error messages are being generated by pdftotext. They are not Python Exceptions, so they can not be caught with try..except.

You can run pdftotext -q to silence the error messages:

 subprocess.call(['pdftotext', '-q', os.path.join(ultimate_path,file), 'output.txt'])
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677