Changing shape markers depend on a third string variable in matplotlib

Question

I have a pandas dataframe called df_comunidades. Here we can see something similar to a head():

df_comunidades.head()

Tormenta    Comunidad   TIEPI   Gustmax
0   ANA ANDALUCIA   0.050   130.2
1   ANA ARAGON  0.250   90.5
2   BRUNO   ANDALUCIA   0.012   114.0
3   BRUNO   CATALUNYA   0.023   78.2
4   KARINE  ARAGON  3.500   80.2
5   ANA BALEARES    2.000   97.2

Every "Comunidad" has a different color in my scatter plot, but furthermore, I want that every "Tormenta" has a different shape marker. I tried many ways... one of them similar to the method I used for colors. I tried also with a loop for i in range(len(markers)): where all the markers are saved in a list markers=['o','v','<','>','1','8','s','*','x','d'] My stable code is:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# initialize list of lists
data = [['ANA', 'ANDALUCIA', 0.05, 130.2], ['ANA', 'ARAGON', 0.25, 90.5], ['BRUNO', 'ANDALUCIA', 0.012, 114], ['BRUNO', 'CATALUNYA', 0.023, 78.2],['KARINE', 'ARAGON', 3.5, 80.2], ['ANA', 'BALEARES', 2, 97.2]]
 
# Create the pandas DataFrame
df_comunidades = pd.DataFrame(data, columns = ['Tormenta', 'Comunidad', 'TIEPI', 'Gustmax'])

#I define every color for every "Comunidad"
colors = {'ANDALUCIA' : 'g',
          'CATALUNYA' : 'y',
          'BALEARES' : 'r', 
          'ARAGON' : 'c'}
c = [colors[comunid] for comunid in df_comunidades['Comunidad']]

plt.scatter(df_comunidades['TIEPI'], df_comunidades['Gustmax'], alpha=0.5, c=c)

ax = plt.subplot(1, 1, 1)
#code to title the axes and the plot: 
ax.set_xlim([0,3])
ax.set_xlabel("TIEPI")
ax.set_ylabel("Max Gusts in community")
plt.title("Relation between max gusts and TIEPI in autonomous communities")
plt.savefig('max_tiepi-gusts_comunid.png',dpi=300)

I got this... it seems like the square is above the rest... but every point in the scatter is supposed to be a "Tormenta" and the colour indicates "Comunidad".

With the whole data the appearance would be like this:

Edited after comments in order to be more clear

I'm trying also to add one legend for the colors and one legend for the shape markers outside the figure... I tried with: ```plt.legend()``` or ```ax.legend()``` or ```plt.legend(colors)```, ```plt.legend(c)```... etc — Carmeni202, Sep 06 '21 at 15:24
i think you are plotting everything len(markers) times. Instead make an array with the respective markers for each element like with the colors or loop through the groups instead and only plot those one by one. — Eumel, Sep 06 '21 at 15:35
I tried that also but I then have an error like this: ```ValueError: Unrecognized marker style``` .I tried the loop based on this post: https://stackoverflow.com/questions/31809947/valueerror-unrecognized-marker-style-d-when-looping-over-markers — Carmeni202, Sep 06 '21 at 15:45
Thank you for your comments, I was already editing the code. Now there are no errors like that. Thanks for your suggestions and your time! — Carmeni202, Sep 06 '21 at 17:48
Kudos, you did a fine job of offering an MRE, thank you. https://stackoverflow.com/help/minimal-reproducible-example It was easy to run your code, obtain a result, and iterate on that. — J_H, Sep 06 '21 at 18:32

J_H · Accepted Answer · 2021-09-08T00:40:38.213

I think you're just lamenting that scatter() takes a sequence of colors, yet just a single marker. So we will need to loop over the N points. (Or we could .groupby() if we wanted make just T calls for T tormentas.)

There seems to be a discrepancy between the "MIN gusts" label and the "GustMAX" column.

There are many markers to choose from. You might try 'v', 's', 'p' to go through a progression of 3-, 4-, 5-sided marks.

I made these changes to produce the enclosed chart.

--- a/tmp/so_69076918_orig.py
+++ b/tmp/so_69076918.py
@@ -14,14 +14,22 @@ colors = {'ANDALUCIA' : 'g',
           'CATALUNYA' : 'y',
           'BALEARES' : 'r', 
           'ARAGON' : 'c'}
+markers = {'ANA': 'v',
+           'BRUNO': 'x',
+           'KARINE': 'd'}
 c = [colors[comunid] for comunid in df_comunidades['Comunidad']]
 
-plt.scatter(df_comunidades['TIEPI'], df_comunidades['Gustmax'], alpha=0.5, c=c)
-
 ax = plt.subplot(1, 1, 1)
 #code to title the axes and the plot: 
-ax.set_xlim([0,3])
+ax.set_xlim([-.1, 4])
 ax.set_xlabel("TIEPI")
+ax.set_ylim([0, 140])
 ax.set_ylabel("Min Gusts in community")
 plt.title("Relation between min gusts and TIEPI in autonomous communities")
+
+for row in df_comunidades.itertuples():
+    plt.scatter([row.TIEPI], [row.Gustmax], alpha=0.5,
+                c=colors[row.Comunidad],
+                marker=markers[row.Tormenta])
+
 plt.savefig('min_tiepi-gusts_comunid.png',dpi=300)

Thank you very much for your time! The discrepancy between the "MIN gusts" label and the "GustMAX" column is due to the method I'm using to see the threshold: I'm observing the minimum from all the maximum calculated before. I'm sorry if it was a bit messy because of that. You're right so I will change it in order to be more clear! — Carmeni202, Sep 07 '21 at 07:50

Changing shape markers depend on a third string variable in matplotlib

1 Answers1