2

The following example returns an error. It appears that using a discrete (not continuous) scale for the x-axis in ggplot in Python is not supported?

import pandas as pd
import ggplot

df = pd.DataFrame.from_dict({'a':['a','b','c'],
                   'percentage':[.1,.2,.3]})

p = ggplot.ggplot(data=df,
                  aesthetics=ggplot.aes(x='a',
                                        y='percentage'))\
    + ggplot.geom_point()

print(p)

As mentioned, this returns:

Traceback (most recent call last):
  File "/Users/me/Library/Preferences/PyCharm2016.1/scratches/scratch_1.py", line 30, in <module>
    print(p)
  File "/Users/me/lib/python3.5/site-packages/ggplot/ggplot.py", line 116, in __repr__
    self.make()
  File "/Users/me/lib/python3.5/site-packages/ggplot/ggplot.py", line 627, in make
    layer.plot(ax, facetgroup, self._aes, **kwargs)
  File "/Users/me/lib/python3.5/site-packages/ggplot/geoms/geom_point.py", line 60, in plot
    ax.scatter(x, y, **params)
  File "/Users/me/lib/python3.5/site-packages/matplotlib/__init__.py", line 1819, in inner
    return func(ax, *args, **kwargs)
  File "/Users/me/lib/python3.5/site-packages/matplotlib/axes/_axes.py", line 3838, in scatter
    x, y, s, c = cbook.delete_masked_points(x, y, s, c)
  File "/Users/me/lib/python3.5/site-packages/matplotlib/cbook.py", line 1848, in delete_masked_points
    raise ValueError("First argument must be a sequence")
ValueError: First argument must be a sequence

Any workarounds for using ggplot with scatters on a discrete scale?

canary_in_the_data_mine
  • 2,193
  • 2
  • 24
  • 28

2 Answers2

1

One option is to generate a continuous series, and use the original variable as labels. But this seems like a painful workaround.

df = pd.DataFrame.from_dict( {'a':[0,1,2],
                   'a_name':['a','b','c'],
                   'percentage':[.1,.2,.3]})

p = ggplot.ggplot(data=df,
                  aesthetics=ggplot.aes(x='a',
                                        y='percentage'))\
    + ggplot.geom_point()\
    + ggplot.scale_x_continuous(breaks=list(df['a']),
                              labels=list(df['a_name']))
canary_in_the_data_mine
  • 2,193
  • 2
  • 24
  • 28
0

I was getting the same error when trying to plot 2 columns of a dataframe. I was reading the data from a csv file and converting it into a dataframe.

readdata=csv.reader(open(filename),delimiter="\t")
df= pd.DataFrame(data, columns=header)
df.columns=["pulseVoltage","dutVoltage","dutCurrent","leakageCurrent"]
print (df.dtypes)

When I checked the data types, for some reason they were shown as object instead of float that I expected (I am a newbie and this might be trivial knowledge which I don't know). Therefore, I went ahead and did an explicit conversion of columns to data type float.

 df["dutVoltage"]=df["dutVoltage"].astype("float")
 df["dutCurrent"]=df["dutCurrent"].astype("float")

Now I can use ggplot to plot the data without any error.

print ggplot(df, aes('dutVoltage','dutCurrent'))+ \
geom_point()
beeprogrammer
  • 581
  • 1
  • 7
  • 18