68

I am trying out Seaborn to make my plot visually better than matplotlib. I have a dataset which has a column 'Year' which I want to plot on the X-axis and 4 Columns say A,B,C,D on the Y-axis using different coloured lines. I was trying to do this using the sns.lineplot method but it allows for only one variable on the X-axis and one on the Y-axis. I tried doing this

sns.lineplot(data_preproc['Year'],data_preproc['A'], err_style=None)
sns.lineplot(data_preproc['Year'],data_preproc['B'], err_style=None)
sns.lineplot(data_preproc['Year'],data_preproc['C'], err_style=None)
sns.lineplot(data_preproc['Year'],data_preproc['D'], err_style=None)

But this way I don't get a legend in the plot to show which coloured line corresponds to what. I tried checking the documentation but couldn't find a proper way to do this.

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
SSPdude
  • 959
  • 1
  • 7
  • 11
  • This is more simply accomplished by directly plotting the DataFrame, as demonstrated in [How to plot multiple pandas columns](https://stackoverflow.com/q/47775220/7758804). `data_preproc.plot(x='Year', xticks=data_preproc.Year, figsize=(10, 6))` and [plot](https://i.stack.imgur.com/Bjn1y.png) – Trenton McKinney Jan 24 '23 at 22:27

3 Answers3

99

Seaborn favors the "long format" as input. The key ingredient to convert your DataFrame from its "wide format" (one column per measurement type) into long format (one column for all measurement values, one column to indicate the type) is pandas.melt. Given a data_preproc structured like yours, filled with random values:

num_rows = 20
years = list(range(1990, 1990 + num_rows))
data_preproc = pd.DataFrame({
    'Year': years, 
    'A': np.random.randn(num_rows).cumsum(),
    'B': np.random.randn(num_rows).cumsum(),
    'C': np.random.randn(num_rows).cumsum(),
    'D': np.random.randn(num_rows).cumsum()})

A single plot with four lines, one per measurement type, is obtained with

sns.lineplot(x='Year', y='value', hue='variable', 
             data=pd.melt(data_preproc, ['Year']))

enter image description here

(Note that 'value' and 'variable' are the default column names returned by melt, and can be adapted to your liking.)

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
dnswlt
  • 2,925
  • 19
  • 15
23

This:

sns.lineplot(data=data_preproc)

will do what you want.

WolvhLorien
  • 355
  • 2
  • 5
  • you are right. but this should a comment in @dnswlt 's answer mentioning that there is no need for changing the dataframe – kyriakosSt Jun 18 '20 at 14:33
  • 4
    This would be a better answer if you explained how the code you provided answers the question. – pppery Jun 19 '20 at 00:53
  • The point of using `seaborn` is to get better functionality, isn't it? And this method is the simplest and does the job exactly. Thanks :) – theProcrastinator Mar 22 '22 at 09:34
  • 1
    This is not a better answer because there is a `'Year'` column, which should be the x-axis, and 4 data columns to plot against the y-axis. This answer will also plot `'Year'` on the y-axis. As such, given the DataFrame structured as in the OP, **this answer is not correct**. See [code and plot](https://i.stack.imgur.com/uk4kZ.png) – Trenton McKinney Jan 24 '23 at 22:18
18

See the documentation:

sns.lineplot(x="Year", y="signal", hue="label", data=data_preproc)

You probably need to re-organize your dataframe in a suitable way so that there is one column for the x data, one for the y data, and one which holds the label for the data point.

You can also just use matplotlib.pyplot. If you import seaborn, much of the improved design is also used for "regular" matplotlib plots. Seaborn is really "just" a collection of methods which conveniently feed data and plot parameters to matplotlib.

IonicSolutions
  • 2,559
  • 1
  • 18
  • 31