8

When you print a pandas DataFrame, which calls DataFrame.to_string, it normally inserts a minimum of 2 spaces between the columns. For example, this code

import pandas as pd

df = pd.DataFrame( {
    "c1" : ("a", "bb", "ccc", "dddd", "eeeeee"),
    "c2" : (11, 22, 33, 44, 55),
    "a3235235235": [1, 2, 3, 4, 5]
} )
print(df)

outputs

       c1  c2  a3235235235
0       a  11            1
1      bb  22            2
2     ccc  33            3
3    dddd  44            4
4  eeeeee  55            5

which has a minimum of 2 spaces between each column.

I am copying DataFarames printed on the console and pasting it into documents, and I have received feedback that it is hard to read: people would like more spaces between the columns.

Is there a standard way to do that?

I see no option in either DataFrame.to_string or pandas.set_option.

I have done a web search, and not found an answer. This question asks how to remove those 2 spaces, while this question asks why sometimes only 1 space is between columns instead of 2 (I also have seen this bug, hope someone answers that question).

My hack solution is to define a function that converts a DataFrame's columns to type str, and then prepends each element with a string of the specified number of spaces.

This code (added to the code above)

def prependSpacesToColumns(df: pd.DataFrame, n: int = 3):
    spaces = ' ' * n
    
    # ensure every column name has the leading spaces:
    if isinstance(df.columns, pd.MultiIndex):
        for i in range(df.columns.nlevels):
            levelNew = [spaces + str(s) for s in df.columns.levels[i]]
            df.columns.set_levels(levelNew, level = i, inplace = True)
    else:
        df.columns = spaces + df.columns
    
    # ensure every element has the leading spaces:
    df = df.astype(str)
    df = spaces + df
    
    return df

dfSp = prependSpacesToColumns(df, 3)
print(dfSp)

outputs

          c1     c2    a3235235235
0          a     11              1
1         bb     22              2
2        ccc     33              3
3       dddd     44              4
4     eeeeee     55              5

which is the desired effect.

But I think that pandas surely must have some builtin simple standard way to do this. Did I miss how?

Also, the solution needs to handle a DataFrame whose columns are a MultiIndex. To continue the code example, consider this modification:

idx = (("Outer", "Inner1"), ("Outer", "Inner2"), ("Outer", "a3235235235"))
df.columns = pd.MultiIndex.from_tuples(idx)
HaroldFinch
  • 762
  • 1
  • 6
  • 17

1 Answers1

5

You can accomplish this through formatters; it takes a bit of code to create the dictionary {'col_name': format_string}. Find the max character length in each column or the length of the column header, whichever is greater, add some padding, and then pass a formatting string.

Use partial from functools as the formatters expect a one parameter function, yet we need to specify a different width for each column.

Sample Data

import pandas as pd
df = pd.DataFrame({"c1": ("a", "bb", "ccc", "dddd", 'eeeeee'),
                   "c2": (1, 22, 33, 44, 55),
                   "a3235235235": [1,2,3,4,5]})

Code

from functools import partial

# Formatting string 
def get_fmt_str(x, fill):
    return '{message: >{fill}}'.format(message=x, fill=fill)

# Max character length per column
s = df.astype(str).agg(lambda x: x.str.len()).max() 

pad = 6  # How many spaces between 
fmts = {}
for idx, c_len in s.iteritems():
    # Deal with MultIndex tuples or simple string labels. 
    if isinstance(idx, tuple):
        lab_len = max([len(str(x)) for x in idx])
    else:
        lab_len = len(str(idx))

    fill = max(lab_len, c_len) + pad - 1
    fmts[idx] = partial(get_fmt_str, fill=fill)

print(df.to_string(formatters=fmts))

            c1      c2      a3235235235
0            a      11                1
1           bb      22                2
2          ccc      33                3
3         dddd      44                4
4       eeeeee      55                5

# MultiIndex Output
         Outer                             
        Inner1      Inner2      a3235235235
0            a          11                1
1           bb          22                2
2          ccc          33                3
3         dddd          44                4
4       eeeeee          55                5
ALollz
  • 57,915
  • 7
  • 66
  • 89
  • Thanks! I am marking your response as the answer, because you show a deep low level way to achieve ANY desired formatting effect, including my need for column spacing. I still think that pandas needs to add some kind of option to DataFrame.to_string... – HaroldFinch Feb 26 '21 at 20:32
  • Compared to my original 3 line hack solution, your code addresses one defect: the column names also need to be considered. I have edited my question's hack code to address that, as well as to use your modified DataFrame (which includes a long column name). – HaroldFinch Feb 26 '21 at 20:33
  • I note, however, that your solution has this small issue: the spacing between columns is not constant! I count 8 spaces between the index column and c1, then 7 spaces between the remaining two columns. My modified hack code, in contrast, produces the same number of spaces between all columns. – HaroldFinch Feb 26 '21 at 20:33
  • @HaroldFinch it's difficult to tell but that's likely an issue due to the formatting of the index (with an extra space) so it gets lumped in with what looks like the first column. I.e. index pads to the right, columns bad to the left, so that first column looks weird. AFAIK, you can't format the index so you could make it a column, format that and then to_string(index=False). But it does get complicated. – ALollz Feb 26 '21 at 20:56
  • I just discovered that both of our original codes do not handle a DataFrame whose columns are of type pandas.MultiIndex. My original code crashes, while yours puts in way too many spaces. I will edit my question to handle MultiIndex. – HaroldFinch Feb 27 '21 at 03:09
  • @HaroldFinch yeah makes sense, because I had just converted the tuple to a string. I've updated the answer to deal with a MultiIndex appropriately, but still won't fix the extra space issue on the Index. – ALollz Mar 02 '21 at 15:34