2

I'm new to seaborn and I have this dataset and want to create a graph like this, but with seaborn.

enter image description here

This is my data:

max_depth = [ 3,  3,  3,  3,  3,  5,  5,  5,  5,  5,  7,  7,  7,  7,  7, 10, 10,
       10, 10, 10, 12, 12, 12, 12, 12]
min_samples_split = [2, 5, 15, 20, 25, 2, 5, 15, 20, 25, 2, 5,
   15, 20, 25, 2, 5, 15, 20, 25, 2, 5, 15, 20, 25]
test_score = [0.85089537, 0.85089537, 0.85089537, 0.85348114, 0.85354819, 0.87357118, 0.87328475, 0.87147859, 0.87425471, 0.87402261,
       0.86355856, 0.86120602, 0.87259394, 0.87582926, 0.87943536, 0.80913078, 0.82786446, 0.86109688, 0.86773115, 0.87619951,
       0.79090683, 0.8038633 , 0.84915534, 0.86083209, 0.87192132]

results_DT = pd.DataFrame({'max_depth': max_depth, 'min_samples_split': min_samples_split, 'test_score': test_score})

And this is my attempt in seaborn:

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
sns.lineplot(x = 'max_depth', y = 'test_score', hue = 'min_samples_split', marker = 'o', data = results_DT) # need to work out how to fix this
plt.legend(loc='lower left')
plt.xlabel("Max depth")
plt.ylabel("Mean CV score")

But as you can see, the categories are incorrect:

enter image description here

And when I try to convert it to string, it comes up with an error.

#convert   
results_DT2 = results_DT
    results_DT2['min_samples_split'] = results_DT2['min_samples_split'].astype(str)

sns.lineplot(x = 'max_depth', y = 'test_score', hue = 'min_samples_split', marker = 'o', data = results_DT2) # need to work out how to fix this
plt.legend(loc='lower left')
plt.xlabel("Max depth")
plt.ylabel("Mean CV score")

AttributeError: 'str' object has no attribute 'view'

How do I fix this?

Veliko
  • 747
  • 1
  • 9
  • 25
william3031
  • 1,653
  • 1
  • 18
  • 39

2 Answers2

5

You can convert the min_samples_split column to categorical:

results_DT.min_samples_split = pd.Categorical(results_DT.min_samples_split)
sns.lineplot(x = 'max_depth', y = 'test_score', hue = 'min_samples_split', marker = 'o', data = results_DT)

enter image description here

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
1

try this:

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
sns.lineplot(x = 'max_depth', y = 'test_score', hue = 'min_samples_split', marker = 'o', data = results_DT, palette=sns.color_palette("Set1", results_DT.min_samples_split.nunique())) # need to work out how to fix this
plt.legend(loc='lower left')
plt.xlabel("Max depth")
plt.ylabel("Mean CV score")

as mentioned and explained in this post

Jonas Pirner
  • 152
  • 7