What does the subset argument do in pandas.io.formats.style.Styler.format?

Question

The public documentation for pandas.io.formats.style.Styler.format says

subset : IndexSlice
An argument to DataFrame.loc that restricts which elements formatter is applied to.

But looking at the code, that's not quite true... what is this _non_reducing_slice stuff?

    if subset is None:
        row_locs = range(len(self.data))
        col_locs = range(len(self.data.columns))
    else:
        subset = _non_reducing_slice(subset)
        if len(subset) == 1:
            subset = subset, self.data.columns

        sub_df = self.data.loc[subset]

Use case: I want to format a particular row, but I get an error when I naively follow the documentation with something that works fine with .loc[]:

>>> import pandas as pd
>>>
>>> df = pd.DataFrame([dict(a=1,b=2,c=3),dict(a=3,b=5,c=4)])
>>> df = df.set_index('a')
>>> print df
   b  c
a
1  2  3
3  5  4
>>> def J(x):
...     return '!!!%s!!!' % x
...
>>> df.style.format(J, subset=[3])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\io\formats\style.py", line 372, in format
    sub_df = self.data.loc[subset]
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1325, in __getitem__
    return self._getitem_tuple(key)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 841, in _getitem_tuple
    self._has_valid_tuple(tup)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 189, in _has_valid_tuple
    if not self._has_valid_type(k, i):
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1418, in _has_valid_type
    (key, self.obj._get_axis_name(axis)))
KeyError: 'None of [[3]] are in the [columns]'
>>> df.loc[3]
b    5
c    4
Name: 3, dtype: int64
>>> df.loc[[3]]
   b  c
a
3  5  4

OK, I tried using IndexSlice and it seems flaky -- works in some cases, doesn't work in others, at least in Pandas 0.20.3:

Python 2.7.14 |Anaconda custom (64-bit)| (default, Oct 15 2017, 03:34:40) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import numpy as np
>>> idx = pd.IndexSlice
>>> r = np.arange(16).astype(int)
>>> colors = 'red green blue yellow'.split()
>>> df = pd.DataFrame(dict(a=[colors[i] for i in r//4], b=r%4, c=r*100)).set_index(['a','b'])
>>> print df
             c
a      b
red    0     0
       1   100
       2   200
       3   300
green  0   400
       1   500
       2   600
       3   700
blue   0   800
       1   900
       2  1000
       3  1100
yellow 0  1200
       1  1300
       2  1400
       3  1500
>>> df.loc[idx['yellow']]
      c
b
0  1200
1  1300
2  1400
3  1500
>>> def J(x):
...     return '!!!%s!!!' % x
...
>>> df.style.format(J,idx['yellow'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\io\formats\style.py", line 372, in format
    sub_df = self.data.loc[subset]
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1325, in __getitem__
    return self._getitem_tuple(key)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 836, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 948, in _getitem_lowerdim
    return self._getitem_nested_tuple(tup)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1023, in _getitem_nested_tuple
    obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1541, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1081, in _getitem_iterable
    self._has_valid_type(key, axis)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1418, in _has_valid_type
    (key, self.obj._get_axis_name(axis)))
KeyError: "None of [['yellow']] are in the [columns]"
>>> pd.__version__
u'0.20.3'

In pandas 0.24.2 I get a similar error but slightly different:

>>> df.style.format(J,idx['yellow'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\io\formats\style.py", line 401, in format
    sub_df = self.data.loc[subset]
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1494, in __getitem__
    return self._getitem_tuple(key)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 868, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 969, in _getitem_lowerdim
    return self._getitem_nested_tuple(tup)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1048, in _getitem_nested_tuple
    obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1902, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1205, in _getitem_iterable
    raise_missing=False)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1161, in _get_listlike_indexer
    raise_missing=raise_missing)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1246, in _validate_read_indexer
    key=key, axis=self.obj._get_axis_name(axis)))
KeyError: u"None of [Index([u'yellow'], dtype='object')] are in the [columns]"
>>> pd.__version__
u'0.24.2'

Oh wait -- I wasn't specifying enough index information; this works:

df.style.format(J,idx['yellow',:])

I have used this argument to apply formatting to some, but not all, cells in a dataframe. Is there some case where its ultimate behavior is surprising you? For example, a case where calling `format(..., subset=s)` gives different results than `df.loc[s]`? I think that would qualify as a bug. — NicholasM, Dec 05 '19 at 20:53
It's says right there that `[3]` is not your column. What exactly are you trying to do? — Quang Hoang, Dec 05 '19 at 20:59
I'm trying to apply my Formatter to a portion of the data frame, in particular one row, that is selected by `.loc[]`, which is exactly what `3` or `[3]` does.. — Jason S, Dec 05 '19 at 21:00

score 1 · Answer 1 · answered Dec 05 '19 at 20:53

1

It indeed does what it supposed to do.

df = pd.DataFrame(np.arange(16).reshape(4,4))

df.style.background_gradient(subset=[0,1])

df.style.background_gradient()

gives:

respectively.

answered Dec 05 '19 at 20:53

Quang Hoang

146,074
10
56
74

In your case, it does. In general, it does not. – Jason S Dec 05 '19 at 20:58

NicholasM · Accepted Answer · 2019-12-05T21:18:07.150

1

I agree that the behavior you showed is not ideal.

>>> df = (pandas.DataFrame([dict(a=1,b=2,c=3),
                            dict(a=3,b=5,c=4)])
            .set_index('a'))
>>> df.loc[[3]]
   b  c
a      
3  5  4
>>> df.style.format('{:.2f}', subset=[3])
Traceback (most recent call last)
...
KeyError: "None of [Int64Index([3], dtype='int64')] are in the [columns]"

You can work around this issue by passing a fully-formed pandas.IndexSlice as the subset argument:

>>> df.style.format('{:.2f}', subset=pandas.IndexSlice[[3], :])

Since you asked what _non_reducing_slice() is doing, its goal is reasonable (ensure a subset does not reduce dimensionality to Series). Its implementation treats a list as a sequence of column names:

From pandas/core/indexing.py:

def _non_reducing_slice(slice_):
    """
    Ensurse that a slice doesn't reduce to a Series or Scalar.

    Any user-paseed `subset` should have this called on it
    to make sure we're always working with DataFrames.
    """
    # default to column slice, like DataFrame
    # ['A', 'B'] -> IndexSlices[:, ['A', 'B']]
    kinds = (ABCSeries, np.ndarray, Index, list, str)
    if isinstance(slice_, kinds):
        slice_ = IndexSlice[:, slice_] 
    ...

I wonder if the documentation could be improved: in this case, the exception raised with subset=[3] matches the behavior of df[[3]] rather than df.loc[[3]].

edited Dec 05 '19 at 21:18

answered Dec 05 '19 at 21:01

NicholasM

4,557
1
20
47

OK. It would be great if the documentation matched the code. (such an example as you give would be really helpful, especially since selecting one or more rows is a rather common thing) – Jason S Dec 05 '19 at 21:01
so.... you answered the question of what I was trying to do, but not the question I asked... what exactly is `subset` really doing in the code? – Jason S Dec 05 '19 at 21:03
1

But the document does say that `subset`: *IndexSlice*. – Quang Hoang Dec 05 '19 at 21:03
GAH! How did I miss that? :( – Jason S Dec 05 '19 at 21:04
now I just have to figure out what IndexSlice does :/ – Jason S Dec 05 '19 at 21:06
@JasonS, you're right that an example in the documentation would be helpful. It seems like the behavior of subset might actually be reflecting `df[[3]]` rather than `df.loc[[3]]`. – NicholasM Dec 05 '19 at 21:13
Grr. `IndexSlice` doesn't solve the problem, either, at least not in some cases. (see my latest edit to the question) – Jason S Dec 05 '19 at 21:35
oh, never mind, I wasn't handling multi-indices correctly – Jason S Dec 05 '19 at 21:43

What does the subset argument do in pandas.io.formats.style.Styler.format?

2 Answers2

Linked