0

I'm using a for loop to work through an entire folder of pdfs, which are converted to csv files.

import tabula
import os
import pandas as pd
files_in_directory = os.listdir()

filtered_files = [file for file in files_in_directory if file.endswith(".pdf")]
print(range(len(filtered_files)))
for file in range(len(filtered_files)):
    print(file-1)
    print(range(len(filtered_files)))

    print(file)
    print(filtered_files[file-1])
    df = tabula.read_pdf(filtered_files[file-1])
    csv_name = filtered_files[file-1] + '.csv'
    df[file-1].to_csv(csv_name, encoding='utf-8')

Here is my log:

Traceback (most recent call last):
  File "/Users/braydenyates/Documents/Band PDFS/csv_converter.py", line 16, in <module>
    df[file-1].to_csv(csv_name, encoding='utf-8')
IndexError: list index out of range

The code appears to run two of the sixty-three files in the folder, then ends due to this error. Thank you for your help!

1 Answers1

0

The number of PDF files you have does not necessarily equal to the number of dataframes tabula manages to extract from one of the PDFs. file represents the Nth file while df is a list of dataframes actually. Therefore df[file-1] is something that's not really sensible to use. Loop through the dataframes and same them individually or whatever is intended.

Here, have a more pythonic and simpler solution:

import tabula
import os
import pandas as pd

files_in_directory = os.listdir()
filtered_files = [file for file in files_in_directory if file.endswith(".pdf")]

for file in filtered_files:
    dfs = tabula.read_pdf(file)

    for nth_frame, df in enumerate(dfs, start=1):
        csv_name = f'{file}_{nth_frame}.csv'
        df.to_csv(csv_name, encoding='utf-8')
miksus
  • 2,426
  • 1
  • 18
  • 34
  • That makes a lot more sense. I need to brush up on my for loops! I think my mistake in thinking is that I thought that 'file' was an integer that worked through the enumerated 'filtered_files'. I didn't realize when you ran a for loop through an array, that it was actually using the values themselves. Thanks for your help and I'll be sure to review and learn from it! –  Jul 17 '21 at 13:16