1

Im stuck on an assignment where they have us use data from

https://raw.githubusercontent.com/Geoyi/Cleaning-Titanic-Data/master/titanic_original.csv

Using matplotlib I need to:

Create a scatterplot with the Fare paid and the Age, differ the plot color by gender.

So far I am having trouble getting the color to be plotted by the gender.

So far this is what I have:

 import pandas as pd
import matplotlib.pyplot as plt

titanic = pd.read_csv('https://raw.githubusercontent.com/Geoyi/Cleaning-Titanic-Data/master/titanic_original.csv')

plt.scatter(titanic['age'],titanic['fare'],alpha=0.5)
plt.show()

When I tried this:

plt.scatter(titanic['age'],titanic['fare'], alpha=0.5,c=titanic['sex'])
plt.show()

it gave me a raise ValueError(msg.format(c.shape, x.size, y.size))

cs95
  • 379,657
  • 97
  • 704
  • 746

2 Answers2

3

You're nearly there. You cannot pass strings to c unless they're valid colors. You can either pass a list of valid colors, or pass numeric, integer values by factorizing your column. For example:

plt.scatter(titanic['age'], titanic['fare'], alpha=0.5, c=pd.factorize(titanic['sex'])[0])

Or,

titanic = titanic.dropna(subset=['sex'])

mapping = {'male' : 'blue', 'female' : 'red'}
plt.scatter(titanic['age'], titanic['fare'], alpha=0.5, c=titanic['sex'].map(mapping))

plt.show()

enter image description here

cs95
  • 379,657
  • 97
  • 704
  • 746
0

You will need to remove the NaN row, which is last row here, then:

url= "https://raw.githubusercontent.com/Geoyi/Cleaning-Titanic-Data/master/titanic_original.csv"
titanic = pd.read_csv(url, skipfooter=1, engine='python')
colors = {'male':'red', 'female':'blue'}
fig2, ax2 = plt.subplots()
ax2.scatter(titanic['age'], titanic['fare'], alpha=0.5, c=titanic['sex'].apply(lambda x: colors[x]))
Toastrackenigma
  • 7,604
  • 4
  • 45
  • 55