In Pandas, I've been using custom objects as column labels because they provide rich/flexible functionality for info/methods specific to the column. For example, you can set a custom fmt_fn
to format each column (note this is just an example, my actual column label objects are more complex):
In [100]: class Col:
...: def __init__(self, name, fmt_fn):
...: self.name = name
...: self.fmt_fn = fmt_fn
...: def __str__(self):
...: return self.name
...:
In [101]: sec_col = Col('time', lambda val: str(timedelta(seconds=val)).split('.')[0])
In [102]: dollar_col = Col('money', lambda val: '${:.2f}'.format(val))
In [103]: foo = pd.DataFrame(np.random.random((3, 2)) * 1000, columns = [sec_col, dollar_col])
In [104]: print(foo) # ugly
time money
0 773.181402 720.997051
1 33.779925 317.957813
2 590.750129 416.293245
In [105]: print(foo.to_string(formatters = [col.fmt_fn for col in foo.columns])) # pretty
time money
0 0:12:53 $721.00
1 0:00:33 $317.96
2 0:09:50 $416.29
Okay, so I've been happily doing this for a while, but then I recently came across one part of Pandas that doesn't support this. Specifically, methods to_hdf
/read_hdf
will fail on DataFrames with custom column labels. This is not a dealbreaker for me. I can use pickle instead of HDF5 at the loss of some efficiency.
But the bigger question is, does Pandas in general support custom objects as column labels? In other words, should I continue to use Pandas this way, or will this break in other parts of Pandas (besides HDF5) in the future, causing me future pain?
PS. As a side note, I wouldn't mind if you also chime in on how you solve the problem of column-specific info such as the fmt_fn
in the example above, if you're not currently using custom objects as column labels.