0

I have the following code (comments explain what is occuring):


import os
from io import StringIO
from PyPDF2 import PdfFileReader

# Path to the directory containing the PDF files
pdf_dir = '/path/to/pdf/files'

# Iterate over the files in the directory
for filename in os.listdir(pdf_dir):
  # Check if the file is a PDF file
  if filename.endswith('.pdf'):
    # Construct the full path to the file
    filepath = os.path.join(pdf_dir, filename)

    # Open the PDF file and read its contents
    with open(filepath, 'rb') as f:
      pdf = PdfFileReader(f)

      # Extract the text from the PDF file
      text = ''
      for page in pdf.pages:
        text += page.extractText()

    # Construct the name of the output text file
    txt_filename = filename[:-4] + '.txt'

    # Write the text to the output file
    with open(txt_filename, 'w') as f:
      f.write(text)

When I run the code, it produces a Xref table not zero-indexed. ID numbers for objects will be corrected warning. It is not a hard error, but it makes me wonder if there's a different way I should be doing this.

Thanks for any suggestions.

ahhhgetit
  • 7
  • 3

0 Answers0