12

I need to use python logging module to log pandas dataframe. I need the entire dataframe (all rows) indented equally.

Below is the simple desired output:

Test Dataframe Output Below:

       col1  col2
    0     1     3
    1     2     4

However, I am getting the following output where the indentation is only applied to the first row of the dataframe:

Test Dataframe Output Below:

       col1  col2
0     1     3
1     2     4

Sample code I am running is:

import pandas as pd
import logging

# sample dataframe
test_df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})

# logging set up
logging.basicConfig(level=logging.INFO)
logging.getLogger().handlers.clear()
c_handler = logging.StreamHandler()
c_handler.setFormatter(logging.Formatter('%(message)s'))
logging.getLogger().addHandler(c_handler)

# log the pandas dataframe to console
logging.info(f'\tTest Dataframe Output Below:')
logging.info(f'\n\t\t{test_df}')
logging.info(f'{test_df}')

Any help will be greatly appreciated!

user32147
  • 1,033
  • 4
  • 12
  • 22

2 Answers2

19
logging.info('\t'+ test_df.to_string().replace('\n', '\n\t')) 
Ke Zhang
  • 937
  • 1
  • 10
  • 24
  • 2
    Code only answers are discouraged. Please add some explanation as to how this solves the problem. [From Review](https://stackoverflow.com/review/low-quality-posts/22801789) – Nick Apr 20 '19 at 05:42
  • 1
    Thanks, that's very good point. However in this case, the code is pretty self-explained. pandas and python developer did excellent work in user oriented function naming. We may avoid verbose and conformism. – Ke Zhang Apr 20 '19 at 19:25
  • 1
    @KeZhang works! I assume that when to_string function executes, the row entries are now in string, each ending with '\n', but we are replacing with '\n\t', so rows 2 to end are also indented – user32147 Apr 20 '19 at 20:25
  • 5
    @KeZhang agreed, but you could just say something like, "to indent all the data, simply replace newlines with a newline and a tab character" so it's obvious what the intent of your code is. – Nick Apr 20 '19 at 22:47
0

Assumed you have a setup like this (basically copied from https://www.toptal.com/python/in-depth-python-logging):

# my_logger.py
import logging
import sys
from logging.handlers import TimedRotatingFileHandler

def get_console_handler(formatter=False):
    console_handler = logging.StreamHandler(sys.stdout)
    if formatter:
        formatter = logging.Formatter("%(asctime)s — %(name)s — %(levelname)s — %(message)s")
        console_handler.setFormatter(formatter)
    return console_handler
def get_file_handler(log_file, formatter=False):
    file_handler = TimedRotatingFileHandler(log_file, when='midnight')
    if formatter:
        formatter = logging.Formatter("%(asctime)s — %(name)s — %(levelname)s — %(message)s")
        file_handler.setFormatter(formatter)
    return file_handler
def get_logger(logger_name, log_file, use_formatter=False):
    logger = logging.getLogger(logger_name)
    logger.setLevel(logging.DEBUG) # better to have too much log than not enough
    logger.addHandler(get_console_handler(use_formatter))
    logger.addHandler(get_file_handler(log_file, use_formatter))
    # with this pattern, it's rarely necessary to propagate the error up to parent
    logger.propagate = False
    return logger

... you could import that in other files:

from my_logger import get_logger
logger = get_logger(__name__, 'logs/debug.log', use_formatter=True)
df_logger = get_logger(str(__name__)+'_dfs', 'logs/debug.log', use_formatter=False)

Now, when you want to log a dataframe you can use the following. Since df_logger does not use a formatter, the dataframe representation will be written to the log_file in the correct format.

df_logger.debug(df)

... and for everything else you can use:

logger.debug(some_message)

Maybe you could even combine the two logger instances in a new class.

NiMa
  • 1
  • 1