3

I have a dataset of four years' worth of ACT participation percentages by state entitled 'part_ACT'. Here's a snippet of it:

Index State ACT17 ACT18 ACT19 ACT20
0   Alabama 100 100 100 100
1   Alaska  65  33  38  33
2   Arizona 62  66  73  71
3   Arkansas    100 100 100 100
4   California  31  27  23  19
5   Colorado    100 30  27  25
6   Connecticut 31  26  22  19

I'm trying to produce a line graph with each of the four column headings on the x-axis and their values on the y-axis (1-100). I would prefer to display all of these line graphs into a single figure.

What's the easiest way to do this? I'm fine with Pandas, Matplotlib, Seaborn, or whatever. Thanks much!

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
CJM
  • 47
  • 1
  • 6
  • Does this answer your question? [How do I create a multiline plot using seaborn?](https://stackoverflow.com/questions/52308749/how-do-i-create-a-multiline-plot-using-seaborn) – semblable Apr 03 '21 at 15:22

2 Answers2

3

One solution is to melt the df and plot with hue

import numpy as np
import pandas as pd
import seaborn as sns

df = pd.DataFrame({
    'State': ['A', 'B', 'C', 'D'],
    'x18': sorted(np.random.randint(0, 100, 4)),
    'x19': sorted(np.random.randint(0, 100, 4)),
    'x20': sorted(np.random.randint(0, 100, 4)),
    'x21': sorted(np.random.randint(0, 100, 4)),
})

df_melt = df.melt(id_vars='State', var_name='year')

sns.relplot(
    kind='line',
    data=df_melt,
    x='year', y='value',
    hue='State'
)

enter image description here

Max Pierini
  • 2,027
  • 11
  • 17
3
  • Creating a plot is all about the shape of the DataFrame.
  • One way to accomplish this is by converting the DataFrame from wide to long, with melt, but this isn't necessary.
  • The primary requirement, is set 'State' as the index.
  • Plots can be generated directly with df, or df.T (.T is the transpose of the DataFrame).
  • The OP requests a line plot, but this is discrete data, and the correct way to visualize discrete data is with a bar plot, not a line plot.
  • pandas v1.2.3, seaborn v0.11.1, and matplotlib v3.3.4
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

data = {'State': ['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado', 'Connecticut'],
        'ACT17': [100, 65, 62, 100, 31, 100, 31],
        'ACT18': [100, 33, 66, 100, 27, 30, 26],
        'ACT19': [100, 38, 73, 100, 23, 27, 22],
        'ACT20': [100, 33, 71, 100, 19, 25, 19]}

df = pd.DataFrame(data)

# set State as the index - this is important
df.set_index('State', inplace=True)

# display(df)
             ACT17  ACT18  ACT19  ACT20
State                                  
Alabama        100    100    100    100
Alaska          65     33     38     33
Arizona         62     66     73     71
Arkansas       100    100    100    100
California      31     27     23     19
Colorado       100     30     27     25
Connecticut     31     26     22     19

# display(df.T)
State  Alabama  Alaska  Arizona  Arkansas  California  Colorado  Connecticut
ACT17      100      65       62       100          31       100           31
ACT18      100      33       66       100          27        30           26
ACT19      100      38       73       100          23        27           22
ACT20      100      33       71       100          19        25           19

Plot 1

df.T.plot()
plt.legend(title='State', bbox_to_anchor=(1.05, 1), loc='upper left')

# get rid of the ticks between the labels - not necessary
plt.xticks(ticks=range(0, len(df.T)))

plt.show()

enter image description here

Plot 2 & 3

  • Use pandas.DataFrame.plot with kind='bar' or kind='barh'
  • The bar plot is much better at conveying the yearly changes in the data, and allows for an easy comparison between states.
df.plot(kind='bar')
plt.legend(title='Year', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()
  • kind='bar'

enter image description here

  • kind='barh'

enter image description here

Plot 4

  • Use seaborn.lineplot
  • Will correctly plot a line plot from a wide dataframe with the columns and index labels.
sns.lineplot(data=df.T)
plt.legend(title='State', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()

enter image description here

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158