customizing the legend in a plot derived from a pandas dataframe

Question

I'm working on a python implementation of an agent-based model using the 'mesa' framework (available in Github). In the model, each "agent" on a grid plays a Prisoner's Dilemma game against its neighbors. Each agent has a strategy that determines its move vs. other moves. Strategies with higher payoffs replace strategies with lower payoffs. In addition, strategies evolve through mutations, so new and longer strategies emerge as the model runs. The app produces a pandas dataframe that gets updated after each step. For example, after 106 steps, the df might look like this:

    step strategy count  score
0      0       CC    34   2.08
1      0       DD  1143   2.18
2      0       CD  1261   2.24
3      0       DC    62   2.07
4      1       CC     6   1.88
..   ...      ...   ...    ...
485  106     DDCC    56   0.99
486  106       DD   765   1.00
487  106       DC  1665   1.31
488  106     DCDC    23   1.60
489  106     DDDD    47   0.98

Pandas/matplotlib creates a pretty good plot of this data, calling this simple plot function:

def plot_counts(df):
    df1 = df.set_index('step')
    df1.groupby('strategy')['count'].plot()
    plt.ylabel('count')
    plt.xlabel('step')
    plt.title('Count of all strategies by step')
    plt.legend(loc='best')
    plt.show()

I get this plot:

Not bad, but here's what I can't figure out. The automatic legend quickly gets way too long and the low-frequency strategies are of little interest, so I want the legend to (1) include only the top 4 strategies listed in the above legend and (2) list those strategies in the order they appear in the last step of the model, based on their counts. Looking at the strategies in step 106 in the df, for example, I want the legend to show the top 4 strategies in order DC,DD,DDCC, and DDDD, but not include DCDC (or any other lower-count strategies that might be active).

I have searched through tons of pandas and matplotlib plotting examples but haven't been able to find a solution to this specific problem. It's clear that these plots are extremely customizable, so I suspect there is a way to do this. Any help would be greatly appreciated.

score 0 · Answer 1 · answered Jan 21 '20 at 07:02

0

This post is somewhat similar to what you have asked, I guess you should check the answer on this page: Show only certain items in legend Python Matplotlib. Hope this helps!

answered Jan 21 '20 at 07:02

cerebral_assassin

212
1
4
16

None of the answers there are a real answer to the question posed there. What would work here, is to rename all strategies in df1, except the 4 desired ones to start with '_'. But that still doesn't manage the desired order of appearance in the last step. – JohanC Jan 21 '20 at 09:04

JohanC · Answer 2 · 2020-01-21T08:50:47.643

Here is an approach. I don't have the complete dataframe, so the test is only with the ones displayed in the question.

The pandas part of the question can be solved by assigning the last step to a variable, then querying for the strategies of that step and then getting the highest counts.

To find the handles, we ask matplotlib for all the handles and labels it generated. Then we search each of the strategies in the list of labels, taking its index to get the corresponding handle.

Please note that 'count' is an annoying name for a column. It also is the name of a pandas function, which prevents its use in the dot notation.

import pandas as pd
from matplotlib import pyplot as plt

df = pd.DataFrame(columns=['step', 'strategy', 'count', 'score'],
                  data=[[0, 'CC', 34, 2.08],
                        [0, 'DD', 1143, 2.18],
                        [0, 'CD', 1261, 2.24],
                        [0, 'DC', 62, 2.07],
                        [1, 'CC', 6, 1.88],
                        [106, 'DDCC', 56, 0.99],
                        [106, 'DD', 765, 1.00],
                        [106, 'DC', 1665, 1.31],
                        [106, 'DCDC', 23, 1.60],
                        [106, 'DDDD', 47, 0.98]])
last_step = df.step.max()
strategies_last_step = df.strategy[df['count'][df.step == last_step].nlargest(4).index]

df1 = df.set_index('step')
df1.groupby('strategy')['count'].plot()
plt.ylabel('count')
plt.xlabel('step')
plt.title('Count of all strategies by step')

handles, labels = plt.gca().get_legend_handles_labels()
selected_handles = [handles[labels.index(strategy)] for strategy in strategies_last_step]

legend = plt.legend(handles=selected_handles, loc='best')

plt.show()

score 0 · Answer 3 · answered Jan 21 '20 at 22:26

Thank you, JohanC, you really helped me see what was going on under the hood with this problem. (Also, good point about count as a col name. I changed it to ncount.)

I found your statement:

strategies_last_step = df.strategy[df['count'][df.step == last_step].nlargest(4).index]

wasn't working for me (nlargest got confused about dtypes) so I formulated a slightly different approach. I got a list of correctly ordered strategy names this way:

def plot_counts(df):
    # to customize plot legend, first get the last step in the df
    last_step = df.step.max()
    # next, make new df_last_step, reverse sorted by 'count' & limited to 4 items  
    df_last_step = df[df['step'] == last_step].sort_values(by='ncount', ascending=False)[0:4]
    # put selected and reordered strategies in a list
    top_strategies = list(df_last_step.strategy)

Then, after indexing and grouping my original df and adding my other plot parameters ...

    dfi = df.set_index('step')
    dfi.groupby('strategy')['ncount'].plot()
    plt.ylabel('ncount')
    plt.xlabel('step')
    plt.title('Count of all strategies by step')

I was able to pick out the right handles from the default handles list and reorder them this way:

    handles, labels = plt.gca().get_legend_handles_labels()
    # get handles for top_strategies, in order, and replace default handles
    selected_handles = []
    for i in range(len(top_strategies)):
        # get the index of the labels object that matches this strategy
        ix = labels.index(top_strategies[i])
        # get matching handle w the same index, append it to a new handles list in right order
        selected_handles.append(handles[ix])

Then plot with the new selected_handles:

plt.legend(handles=selected_handles, loc='best')
plt.show()

Result is exactly as intended. Here is a plot after 300+ steps. Legend is in the right order and limited to top 4 strategies:

Note that in Python, it is highly encouraged to rewrite `for i in range(len(top_strategies)): ix = labels.index(top_strategies[i])` as `for strategy in top_strategies: ix = labels.index(strategy)` — JohanC, Jan 22 '20 at 07:45

customizing the legend in a plot derived from a pandas dataframe

3 Answers3