I am reading all the pdf files present in my system and writing it to text file "output.txt" from command line utility "pdftotext", but while reading files which are not properly structured (like pdf files of images and many others),it throws some errors like
/home/vikrantsingh/Downloads/ARRAYS_NEW.pdf
/home/vikrantsingh/Downloads/GPOS_casestudy_solution_v2.pdf
/home/vikrantsingh/Downloads/Tutorial.pdf
/home/vikrantsingh/Downloads/The_C_Programming_Language.pdf
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (27972): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (41087): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (51900): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (62716): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (65450): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (68463): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
What i want is when it encounters first error, just move to next file instead of reading the same file.I am using Python 2.7.The my code is like
import os
import sys
import re
import subprocess
root = '/home'
targetpath = ""
path = os.path.join(root, targetpath)
filepath = []
count = 0
filesize = 0
for r,subdir,f in os.walk(path):
ultimate_path = os.path.join(path,r)
for file in f:
if file.find(".pdf")!=-1:
print os.path.join(ultimate_path,file)
filesize = os.path.getsize(os.path.join(ultimate_path,file))+filesize
subprocess.call(['pdftotext', os.path.join(ultimate_path,file), 'output.txt'])
#print file
count = count+1
print count
print filesize/(1048576.0)
This is sample code for reading pdf files from "pdftotext". I want to catch error so that i move on for reading next pdf.
I have seen one post regarding this . Thanks