I've created a function (part of a created class) which accepts the name of a pdf file then extracts its contents. Inside that function, I've placed a logger which will be sent to a file and the console to display what the current pdf is and if any errors occur. After each successful iteration of the for loop, the logger then prints the message at a multiple of how many loops have run already:
08-Aug-23 13:37:08 - Extract_Data - INFO - The data for pdf1 is now being extracted
08-Aug-23 13:37:32 - Extract_Data - INFO - The data for pdf2 is now being extracted
08-Aug-23 13:37:32 - Extract_Data - INFO - The data for pdf2 is now being extracted
08-Aug-23 13:37:41 - Extract_Data - INFO - The data for pdf3 is now being extracted
08-Aug-23 13:37:41 - Extract_Data - INFO - The data for pdf3 is now being extracted
08-Aug-23 13:37:41 - Extract_Data - INFO - The data for pdf3 is now being extracted
I've tried looking up solutions for duplicate logger messages which led me to try and use logger.propagate = False but that doesn't seem to work. I'm trying to figure out if I've constructed my logger incorrectly or is the for loop causing the issue. Any help would be much appreciated.
Edits:
I've let the program run as far as it could and encountered the following error:
OSError: [Errno 24] Too many open files: 'pdf2021.pdf'
Traceback (most recent call last): File "C:\Users\user\AppData\Local\JetBrains\PyCharm Community Edition 2023.2\plugins\python-ce\helpers\pydev\pydevd.py", line 1500, in _exec
File "C:\Users\user\AppData\Local\JetBrains\PyCharm Community Edition 2023.2\plugins\python-ce\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
File "C:\Users\user\PycharmProjects\pythonProject\Scrapper\Scrapper 8-4-23 v2.py", line 2167, in
File "C:\Users\user\PycharmProjects\pythonProject\Scrapper\Scrapper 8-4-23 v2.py", line 956, in extract_data File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\logging_init_.py", line 1181, in init
File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\logging_init_.py", line 1213, in _open
OSError: [Errno 24] Too many open files: 'C:\Users\user\PycharmProjects\pythonProject\Scrapper\Extract_Data 2023-08-10.log'
After looking through the logging init file. I found out the stream doesn't automatically close after function ends. I've added f_handler.close() and c_handler.close() to the end and still no change.
Current code:
def extract_data(self, pdfname):
os.chdir(path)
logger = logging.getLogger("Extract_Data")
logger.propagate = False
logger.setLevel(logging.DEBUG)
# Create the FileHandler() and StreamHandler() loggers
f_handler = logging.FileHandler('Extract_Data ' + str(datetime.datetime.today().date()) + '.log')
f_handler.setLevel(logging.DEBUG)
c_handler = logging.StreamHandler()
c_handler.setLevel(logging.INFO)
# Create formatting for the loggers
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s',datefmt='%d-%b-%y %H:%M:%S')
# Set the formatter for each handler
f_handler.setFormatter(formatter)
c_handler.setFormatter(formatter)
logger.addHandler(f_handler)
logger.addHandler(c_handler)
pdfname = pdfname
logger.info(f'The data for {pdfname} is now being extracted')
try:
--------------
#PDF extraction code in here
--------------
except re.error as ree:
#logger.exception()
except:
#logger.exception()
else:
f_handler.close()
c_handler.close()
for pdf in obj.pdf_generator():
obj.extract_data(pdf)