2

I am trying to use plotnine to build graphs and I keep coming across the same KeyError problem when I want to plot just the x-axis. See the traceback error below. A sample of my data is:

       WORD  TAG  TOPIC Value      
0       hey  aa      1  234 
1   working  bb      1  123 
2   lullaby  cc      2  32
3     Doggy  cc      2  63
4  document  aa      3  84

sample of my code:

from plotnine import *
import pandas as pd

inFile = 'infile.csv'
df = pd.read_csv(inFile, names = ['WORD', 'TAG','TOPIC','VALUE'], header=0,sep='\t')
df.sort_values('value',ascending=False)
sortedDf = df[:5]

plot1 = ggplot(sortedDf) + aes(x='TOPIC') + geom_histogram(binwidth=3)

where the final goal is to plot the count of each topic in a histogram. I am not sure what data is missing that is raising the following key error, as there is no need for a weight as I am only interested in plotting the count of that one particular variable, ie. topic 1 = 2, topic 2= 2, topic 3 = 1.

Does anyone have any link to more detailled documentation of plotline or any experience with the library to help me understand more in detail what I am missing.

Traceback Error:


    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    <ipython-input-112-71707b4cf21a> in <module>()
          1 plot2 = ggplot(sortedDf) + aes(x='TOPIC') + geom_histogram(binwidth=3)
    ----> 2 print plot2

    /Users/anaconda/lib/python2.7/site-packages/plotnine/ggplot.pyc in __repr__(self)
         82         Print/show the plot
         83         """
    ---> 84         self.draw()
         85         plt.show()
         86         return '<ggplot: (%d)>' % self.__hash__()

    /Users/anaconda/lib/python2.7/site-packages/plotnine/ggplot.pyc in draw(self)
        139         # assign a default theme
        140         self = deepcopy(self)
    --> 141         self._build()
        142 
        143         # If no theme we use the default

    /Users/anaconda/lib/python2.7/site-packages/plotnine/ggplot.pyc in _build(self)
        235 
        236         # Apply and map statistics
    --> 237         layers.compute_statistic(layout)
        238         layers.map_statistic(self)
        239 

    /Users/anaconda/lib/python2.7/site-packages/plotnine/layer.pyc in compute_statistic(self, layout)
         92     def compute_statistic(self, layout):
         93         for l in self:
    ---> 94             l.compute_statistic(layout)
         95 
         96     def map_statistic(self, plot):

    /Users/anaconda/lib/python2.7/site-packages/plotnine/layer.pyc in compute_statistic(self, layout)
        369         data = self.stat.use_defaults(data)
        370         data = self.stat.setup_data(data)
    --> 371         data = self.stat.compute_layer(data, params, layout)
        372         self.data = data
        373 

    /Users/anaconda/lib/python2.7/site-packages/plotnine/stats/stat.pyc in compute_layer(cls, data, params, layout)
        194             return cls.compute_panel(pdata, pscales, **params)
        195 
    --> 196         return groupby_apply(data, 'PANEL', fn)
        197 
        198     @classmethod

    /Users/anaconda/lib/python2.7/site-packages/plotnine/utils.pyc in groupby_apply(df, cols, func, *args, **kwargs)
        615         # do not mark d as a slice of df i.e no SettingWithCopyWarning
        616         d.is_copy = None
    --> 617         lst.append(func(d, *args, **kwargs))
        618     return pd.concat(lst, axis=axis, ignore_index=True)
        619 

    /Users/anaconda/lib/python2.7/site-packages/plotnine/stats/stat.pyc in fn(pdata)
        192                 return pdata
        193             pscales = layout.get_scales(pdata['PANEL'].iat[0])
    --> 194             return cls.compute_panel(pdata, pscales, **params)
        195 
        196         return groupby_apply(data, 'PANEL', fn)

    /Users/anaconda/lib/python2.7/site-packages/plotnine/stats/stat.pyc in compute_panel(cls, data, scales, **params)
        221         for _, old in data.groupby('group'):
        222             old.is_copy = None
    --> 223             new = cls.compute_group(old, scales, **params)
        224             unique = uniquecols(old)
        225             missing = unique.columns.difference(new.columns)

    /Users/anaconda/lib/python2.7/site-packages/plotnine/stats/stat_bin.pyc in compute_group(cls, data, scales, **params)
        107         new_data = assign_bins(
        108             data['x'], breaks, data.get('weight'),
    --> 109             params['pad'], params['closed'])
        110         return new_data

    /Users/anaconda/lib/python2.7/site-packages/plotnine/stats/binning.pyc in assign_bins(x, breaks, weight, pad, closed)
        163     df = pd.DataFrame({'bin_idx': bin_idx, 'weight': weight})
        164     wftable = df.pivot_table(
    --> 165         'weight', index=['bin_idx'], aggfunc=np.sum)['weight']
        166 
        167     # Empty bins get no value in the computed frequency table.

    /Users/anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
        601             result = self.index.get_value(self, key)
        602 
    --> 603             if not is_scalar(result):
        604                 if is_list_like(result) and not isinstance(result, Series):
        605 

    /Users/anaconda/lib/python2.7/site-packages/pandas/indexes/base.pyc in get_value(self, series, key)

    pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3557)()

    pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3240)()

    pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4363)()

    KeyError: 'weight'
gcamargo
  • 3,683
  • 4
  • 22
  • 34
owwoow14
  • 1,694
  • 8
  • 28
  • 43

1 Answers1

0

Nesting aes in ggplot like it is done in R may solve your issue:

plot1 = ggplot(sortedDf, aes(x='TOPIC')) + geom_histogram(binwidth=3)
gcamargo
  • 3,683
  • 4
  • 22
  • 34
  • 3
    There is no explanation provided to the answer, it may or may not solve the issue, but adding explanations might assist the OP to understand how this could solve the issue. – Gerhard Nov 29 '17 at 05:26