I tried the haystack beginner tutorial. It works fine. Now I try to use a local pdf on my PC instead of the articles from the Game of Thrones Wikipedia and I always get an error.
This is the code
from haystack.nodes import PDFToTextConverter
from pathlib import Path
def haystack():
converter = PDFToTextConverter(
remove_numeric_tables=True,
valid_languages=["de"]
)
docs = converter.convert(file_path=Path("C:/Users/Franzi/Documents/myPDF.pdf"), meta=None)
if __name__ == '__main__':
haystack()
Traceback (most recent call last):
File "C:\Users\Franzi\PycharmProjects\pythonProject2\main.py", line 15, in <module>
haystack()
File "C:\Users\Franzi\PycharmProjects\pythonProject2\main.py", line 11, in haystack
docs = converter.convert(file_path=Path("C:/Users/Franzi/Documents/myPDF.pdf"), meta=None)
File "C:\Users\Franzi\AppData\Local\Programs\Python\Python38\lib\site-packages\haystack\nodes\file_converter\pdf.py", line 171, in convert
pages = self._read_pdf(
File "C:\Users\Franzi\AppData\Local\Programs\Python\Python38\lib\site-packages\haystack\nodes\file_converter\pdf.py", line 301, in _read_pdf
for page in results:
File "C:\Users\Franzi\AppData\Local\Programs\Python\Python38\lib\concurrent\futures\process.py", line 484, in _chain_from_iterable_of_lists
for element in iterable:
File "C:\Users\Franzi\AppData\Local\Programs\Python\Python38\lib\concurrent\futures\_base.py", line 611, in result_iterator
yield fs.pop().result()
File "C:\Users\Franzi\AppData\Local\Programs\Python\Python38\lib\concurrent\futures\_base.py", line 439, in result
return self.__get_result()
File "C:\Users\Franzi\AppData\Local\Programs\Python\Python38\lib\concurrent\futures\_base.py", line 388, in __get_result
raise self._exception
TypeError: getText() got an unexpected keyword argument 'textpage'
I am using Python 3.8 and PyCharm 2023.2. I have tried different PDFs and also tried
from haystack.utils import convert_files_to_docs
convert_files_to_docs()
but it gives me the same error. Any ideas what I am doing wrong here?