2

I am trying to convert all pdfs in a folder into excel files. To do so, I am using the following code, though I am receiving the following error:

FileNotFoundError: [Errno 2] No such file or directory: 'filepath.pdf'

Here is the non-functioning code:

# import packages needed
import glob
!pip install tabula-py
import tabula

# set up working directory
my_dir = 'C:/Users/myfolderwithpdfs'

# transform the pdfs into excel files
for filepath in glob.iglob('my_dir/*.pdf'):
    tabula.convert_into("filepath.pdf","filepath.xlsx", output_format="xlsx")

When I use either only the for loop to print the list of my files (as follows)

for filepath in glob.iglob('my_dir/*.pdf'):
    print(filepath)

or tranform a single file

tabula.convert_into("myfilename.pdf", "myfilename.xlsx", output_format="xlsx")

I encounter no problems or errors with my code.

Matilde
  • 53
  • 5
  • 1
    when you put `my_dir` in the string, it is literally looking for a directory called `my_dir`. Try changing it to `glob.iglob(my_dir + '/*.pdf')`. You also don't use the `filepath` variable inside the for loop. Try changing it to `tabula.convert_into(filepath, 'filepath.xlsx', output_format='xlsx')` – aravk33 Dec 11 '19 at 14:50
  • Thank you! The first tip works, but removing the quotation marks from tabula.convert_into does not really works. Instead of creating an excel files, it creates an empty text editor. It tries to upload it for every file, without any success. Moreover, my pfds are damaged and unreadable (cannot open them)= after running the code. – Matilde Dec 12 '19 at 08:50

1 Answers1

0

You should corret the my_dir in the loop because it is looking for a dir called "my_dir", replace by the actual directory. Also you should only use the filepath refererence created in the loop, no need to use an actual string.

# import packages needed
import glob
import tabula

# transform the pdfs into excel files
for filepath in glob.iglob('C:/Users/myfolderwithpdfs/*.pdf'):
    tabula.convert_into(filepath, output_format="xlsx")
Rafael Neves
  • 467
  • 3
  • 10
  • 1
    Thank you Rafael. Regarding the path change, I had tried that as well. The weird thing (maybe weird since I am new to python) is that the following code is actually working. ``` for filepath in glob.iglob('my_dir/*.pdf'): print(filepath) ``` Although I call my directory with my_dir The suggested change for tabula.convert_into does not really work, since the function requires three arguments. I have also tried as suggested above by arak33 without any luck. – Matilde Dec 12 '19 at 08:53
  • The thing is that tabula does not support converting directly to .xlsx Follow the instructions here https://pypi.org/project/tabula-py/ – Rafael Neves Dec 12 '19 at 13:17