0

I've created a function (part of a created class) which accepts the name of a pdf file then extracts its contents. Inside that function, I've placed a logger which will be sent to a file and the console to display what the current pdf is and if any errors occur. After each successful iteration of the for loop, the logger then prints the message at a multiple of how many loops have run already:

08-Aug-23 13:37:08 - Extract_Data - INFO - The data for pdf1 is now being extracted

08-Aug-23 13:37:32 - Extract_Data - INFO - The data for pdf2 is now being extracted

08-Aug-23 13:37:32 - Extract_Data - INFO - The data for pdf2 is now being extracted

08-Aug-23 13:37:41 - Extract_Data - INFO - The data for pdf3 is now being extracted

08-Aug-23 13:37:41 - Extract_Data - INFO - The data for pdf3 is now being extracted

08-Aug-23 13:37:41 - Extract_Data - INFO - The data for pdf3 is now being extracted

I've tried looking up solutions for duplicate logger messages which led me to try and use logger.propagate = False but that doesn't seem to work. I'm trying to figure out if I've constructed my logger incorrectly or is the for loop causing the issue. Any help would be much appreciated.

Edits:

I've let the program run as far as it could and encountered the following error:

OSError: [Errno 24] Too many open files: 'pdf2021.pdf'

Traceback (most recent call last): File "C:\Users\user\AppData\Local\JetBrains\PyCharm Community Edition 2023.2\plugins\python-ce\helpers\pydev\pydevd.py", line 1500, in _exec

File "C:\Users\user\AppData\Local\JetBrains\PyCharm Community Edition 2023.2\plugins\python-ce\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile

File "C:\Users\user\PycharmProjects\pythonProject\Scrapper\Scrapper 8-4-23 v2.py", line 2167, in

File "C:\Users\user\PycharmProjects\pythonProject\Scrapper\Scrapper 8-4-23 v2.py", line 956, in extract_data File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\logging_init_.py", line 1181, in init

File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\logging_init_.py", line 1213, in _open

OSError: [Errno 24] Too many open files: 'C:\Users\user\PycharmProjects\pythonProject\Scrapper\Extract_Data 2023-08-10.log'

After looking through the logging init file. I found out the stream doesn't automatically close after function ends. I've added f_handler.close() and c_handler.close() to the end and still no change.

Current code:

    def extract_data(self, pdfname):

        os.chdir(path)
        logger = logging.getLogger("Extract_Data")
        logger.propagate = False
        logger.setLevel(logging.DEBUG)
        # Create the FileHandler() and StreamHandler() loggers
        f_handler = logging.FileHandler('Extract_Data ' + str(datetime.datetime.today().date()) + '.log')
        f_handler.setLevel(logging.DEBUG)
        c_handler = logging.StreamHandler()
        c_handler.setLevel(logging.INFO)
        # Create formatting for the loggers
        formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s',datefmt='%d-%b-%y %H:%M:%S')
        # Set the formatter for each handler
        f_handler.setFormatter(formatter)
        c_handler.setFormatter(formatter)
        logger.addHandler(f_handler)
        logger.addHandler(c_handler)

        pdfname = pdfname
        logger.info(f'The data for {pdfname} is now being extracted')
        
      try:
        --------------
        #PDF extraction code in here
        --------------
      except re.error as ree:
          #logger.exception()
      except:
          #logger.exception()
      else:
          f_handler.close()
          c_handler.close()


        

    for pdf in obj.pdf_generator():
         obj.extract_data(pdf)

0 Answers0