25

Say I have this dataframe

d = {     'Path'   : ['abc', 'abc', 'ghi','ghi', 'jkl','jkl'],
          'Detail' : ['foo', 'bar', 'bar','foo','foo','foo'],
          'Program': ['prog1','prog1','prog1','prog2','prog3','prog3'],
          'Value'  : [30, 20, 10, 40, 40, 50],
          'Field'  : [50, 70, 10, 20, 30, 30] }


df = DataFrame(d)
df.set_index(['Path', 'Detail'], inplace=True)
df

               Field Program  Value
Path Detail                      
abc  foo        50   prog1     30
     bar        70   prog1     20
ghi  bar        10   prog1     10
     foo        20   prog2     40
jkl  foo        30   prog3     40
     foo        30   prog3     50

I can aggregate it no problem (if there's a better way to do this, by the way, I'd like to know!)

df_count = df.groupby('Program').count().sort(['Value'], ascending=False)[['Value']]
df_count

Program   Value
prog1    3
prog3    2
prog2    1

df_mean = df.groupby('Program').mean().sort(['Value'], ascending=False)[['Value']]
df_mean

Program  Value
prog3    45
prog2    40
prog1    20

I can plot it from Pandas no problem...

df_mean.plot(kind='bar')

But why do I get this error when I try it in seaborn?

sns.factorplot('Program',data=df_mean)
    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-26-23c2921627ec> in <module>()
----> 1 sns.factorplot('Program',data=df_mean)

C:\Anaconda3\lib\site-packages\seaborn\categorical.py in factorplot(x, y, hue, data, row, col, col_wrap, estimator, ci, n_boot, units, order, hue_order, row_order, col_order, kind, size, aspect, orient, color, palette, legend, legend_out, sharex, sharey, margin_titles, facet_kws, **kwargs)
   2673     # facets to ensure representation of all data in the final plot
   2674     p = _CategoricalPlotter()
-> 2675     p.establish_variables(x_, y_, hue, data, orient, order, hue_order)
   2676     order = p.group_names
   2677     hue_order = p.hue_names

C:\Anaconda3\lib\site-packages\seaborn\categorical.py in establish_variables(self, x, y, hue, data, orient, order, hue_order, units)
    143                 if isinstance(input, string_types):
    144                     err = "Could not interperet input '{}'".format(input)
--> 145                     raise ValueError(err)
    146 
    147             # Figure out the plotting orientation

ValueError: Could not interperet input 'Program'
marshallbanana
  • 435
  • 1
  • 4
  • 9
  • 1
    I got this error when incorrectly using `sns.FacetGrid.map` instead of `sns.FacetGrid.map_dataframe`. – filups21 Sep 23 '22 at 21:14

1 Answers1

30

The reason for the exception you are getting is that Program becomes an index of the dataframes df_mean and df_count after your group_by operation.

If you wanted to get the factorplot from df_mean, an easy solution is to add the index as a column,

In [7]:

df_mean['Program'] = df_mean.index

In [8]:

%matplotlib inline
import seaborn as sns
sns.factorplot(x='Program', y='Value', data=df_mean)

However you could even more simply let factorplot do the calculations for you,

sns.factorplot(x='Program', y='Value', data=df)

You'll obtain the same result.

EDIT after comments

Indeed you make a very good point about the parameter as_index; by default it is set to True, and in that case Program becomes part of the index, as in your question.

In [14]:

df_mean = df.groupby('Program', as_index=True).mean().sort(['Value'], ascending=False)[['Value']]
df_mean

Out[14]:
        Value
Program 
prog3   45
prog2   40
prog1   20

Just to be clear, this way Program is not column anymore, but it becomes the index. the trick df_mean['Program'] = df_mean.index actually keeps the index as it is, and adds a new column for the index, so that Program is duplicated now.

In [15]:

df_mean['Program'] = df_mean.index
df_mean

Out[15]:
        Value   Program
Program     
prog3   45  prog3
prog2   40  prog2
prog1   20  prog1

However, if you set as_index to False, you get Program as a column, plus a new autoincrement index,

In [16]:

df_mean = df.groupby('Program', as_index=False).mean().sort(['Value'], ascending=False)[['Program', 'Value']]
df_mean

Out[16]:
    Program Value
2   prog3   45
1   prog2   40
0   prog1   20

This way you could feed it directly to seaborn. Still, you could use df and get the same result.

user
  • 5,370
  • 8
  • 47
  • 75
lrnzcig
  • 3,868
  • 4
  • 36
  • 50
  • Thanks a lot for the response. I thought it was an indexing issue at first. But according to [the documentation](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html), the `as index` parameter is True by default, so the group label (ie `Program`) is already the index. `df_mean.index` `Index(['prog3', 'prog2', 'prog1'], dtype='object', name='Program')` I tried the second method and I receive the same error as well. – marshallbanana Oct 02 '15 at 13:59
  • I'm not sure we understood each other. Anyway you make a good point about the `as_index` parameter and I'm updating the answer. Hope it is more clear now. – lrnzcig Oct 02 '15 at 14:07
  • Sorry - I just realized that we're saying the same thing about the index. I figured that factorplot would be able to use the index for the x-axis by default. So I'm puzzled that your second solution returns the same error – marshallbanana Oct 02 '15 at 14:09
  • 1
    Sorry, I did a typo. The second solution is `sns.factorplot(x='Program', y='Value', data=df)`, meaning that you could use `df` directly. Hope it makes more sense now. – lrnzcig Oct 02 '15 at 14:16
  • Thank you so much. I see my error is that the x value needs to be a column, not an index. – marshallbanana Oct 02 '15 at 16:20