3

I am trying to use pandas in python to plot the following higher-dimensional data: https://i.stack.imgur.com/34nbR.jpg

Here is my code:

import pandas
from pandas.tools.plotting import parallel_coordinates

data = pandas.read_csv('ParaCoords.csv')
parallel_coordinates(data,'Name')

The code fails to plot the data, and the Traceback error ends with:

Keyerror: 'Name'

What is the second argument in parallel_coordinates supposed to say/do? How can I successfully plot the data?

ysearka
  • 3,805
  • 5
  • 20
  • 41
C. Stucki
  • 33
  • 1
  • 6
  • I think the second argument must be the name of the column you want to use for your plot. That why in `iris.data` they use 'Name'. – ysearka Jun 29 '16 at 15:28
  • Any string I use in the second argument's position (i.e. 'column-name') results in a function error. – C. Stucki Jun 29 '16 at 15:37

2 Answers2

1

The second argument is supposed to be the column name that defines class. Think ['dog', 'dog', 'cat', 'bird', 'cat', 'dog'].

In the example online they use 'Name' as the second argument because that is a column defining names of iris's

Doc

Signature: parallel_coordinates(*args, **kwargs)
Docstring:
Parallel coordinates plotting.

Parameters
----------
frame: DataFrame
class_column: str
    Column name containing class names
cols: list, optional
    A list of column names to use
ax: matplotlib.axis, optional
    matplotlib axis object
color: list or tuple, optional
    Colors to use for the different classes
use_columns: bool, optional
    If true, columns will be used as xticks
xticks: list or tuple, optional
    A list of values to use for xticks
colormap: str or matplotlib colormap, default None
    Colormap to use for line colors.
axvlines: bool, optional
    If true, vertical lines will be added at each xtick
axvlines_kwds: keywords, optional
    Options to be passed to axvline method for vertical lines
kwds: keywords
    Options to pass to matplotlib plotting method
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • I see! So, y is my dependent variable; and x1, x2, x3, and x4 are my independent variables. The second argument should be 'y'; or it could be 'x1', 'x2', etc. – C. Stucki Jun 29 '16 at 15:42
  • hi! Is there any options for customising the legend of parallel_coordinates plot? – cucurbit Jul 11 '17 at 16:12
  • 1
    @cucurbit you can customize it the same way you would for any `matplotlib` plot. Essentially, assign the return value of the plot to a variable. It will be an `axes` object. Then manipulate from there. You'll want to search for **matplotlib legend** – piRSquared Jul 11 '17 at 16:16
0

The iris.data file that you download from UCI does not have headers. To make the pandas example work, you have to assign the headers explicitly as column names:

from pandas.tools.plotting import parallel_coordinates
# The iris.data file from UCI does not have headers,
# so we have to assign the column names explicitly.
data = pd.read_csv("data-iris-for-pandas/iris.data")
data.columns=["x1","x2","x3","x4","Name"]
plt.figure()
parallel_coordinates(data,"Name")

Pandas Parallel Coordinates Example

Basically, the pandas documentation is incomplete. Someone put the column names into the dataframe without letting us know.

user5920660
  • 129
  • 5