I'm trying to generate a graph in IPython notebook for a .csv spreadsheet (easily found in kaggle = titanic_data.csv) and put below a summary of the steps I have done. First I imported the worksheet and took out some columns.
titanic_data = pd.read_csv("titanic_data.csv")
titanic_data_cleaned = titanic_data.drop(['Name','Ticket','Cabin','Fare','Embarked'], axis=1
Doubt: I tried to generate the percentage graph below but I could not solve the following error: operands could not be broadcast together with shapes (65,) (77,). I can not solve it, please if anyone can help, thank you very much.
totals = survivors_age_group + non_survivors_age_group
calculate_percentage1 = survivors_age_group
calculate_percentage2 = non_survivors_age_group
# Use calculate_percentage_function to calculate the percentage of total
data1_percentages = calculate_percentage1(survivors_age_group, totals)*100
data2_percentages = calculate_percentage2(non_survivors_age_group, totals)*100
tick_spacing = np.array(range(len(age_labels)))+0.4
# Graph to percentage of survivors per class
ax2.bar(range(len(data1_percentages)), data1_percentages, alpha=0.5, color='g')
ax2.bar(range(len(data2_percentages)), data2_percentages, bottom=data1_percentages, alpha=0.5, color='r')
plt.sca(ax2)
plt.xticks(tick_spacing, age_labels)
ax2.set_ylabel("Percentage")
ax2.set_xlabel("")
ax2.set_title("% of survivors by age group",fontsize=14)