1

I have a dataset which has 14 columns (I had to only use 4 columns: travelling class, gender, age, and fare price) that I have split into train and test data sets. I need to create a vertical bar chart from the train data set for the distribution of the passengers by travelling class (1, 2, and 3 are the classes). I am not allowed to use NumPy, Pandas, SciPy, and SciKit-Learn.

I am very new to Python, and I know how to plot very simple graphs, but when it comes to more complicated graphs, I get a bit lost.

This is my code (I know there is a lot wrong):

travelling_class = defaultdict(list)
for row in data:
    travelling_class[row[0]]

travelling_class = {key: len(val) for key, val in travelling_class.items()}

keys = travelling_class()
vals = [travelling_class[key] for key in keys]
ind  = range(min(travelling_class.keys()), max(travelling_class.keys()) + 1)
width = 0.6

plt.xticks([i + width/2 for i in ind], ind, ha='center')
plt.xlabel('Tracelling Class') 
plt.ylabel('Counts of Passengers')
plt.title('Number of Passengers per Travelling Class')
plt.ylim(0, 1000)
plt.bar(keys, vals, width)
plt.show()

import matplotlib.pyplot as plt

classes = travelling_class[1, 2, 3]

plt.hist(classes)
plt.show()

@TrakJohnson This is the original asker of the question - sorry I accidentally somehow deleted my profile so had to make a new one. Thank you so much for your help. The problem is that my data set is 1045 rows, so it might be difficult to list all of them. Does the above seem reasonable?

  • Have you tried to code? – iparjono Aug 29 '16 at 06:00
  • Hi, yes I have :) –  Aug 29 '16 at 06:49
  • I have inserted my code into the post –  Aug 29 '16 at 07:16
  • what errors did you get? Better if you tell the desired output – iparjono Aug 29 '16 at 07:30
  • I got a type error that pointed to line 8: keys = travelling_class() saying "TypeError: 'dict' object is not callable". Sorry, I should have told the desired output. I need a graph with 3 bars (1 for each of the classes) on the x-axis plotted against the number of people (i.e. counts of records in each class) on the y-axis. –  Aug 29 '16 at 07:32

1 Answers1

1

Use plt.hist, which will plot a histogram (more info here)

Example:

import matplotlib.pyplot as plt

classes = [1, 2, 1, 1, 3, 3]

plt.hist(classes)
plt.show()

And this is the result:

Histogram

TrakJohnson
  • 1,755
  • 2
  • 18
  • 31
  • Thank you heaps :) How can I do that for the classes in a column from a data set? Sorry, I'm a bit inexperienced with Python. –  Aug 29 '16 at 09:39
  • You're welcome :). What you have to do is replace the classes with numerical values, I don't think there is a way to keep them as strings. I don't know what type of dataset you are using, but it should be easy to convert it into a list and input it as in the example. – TrakJohnson Aug 29 '16 at 15:39
  • :) My data set is from the titanic and I have to use the column variables: travelling class (integer), gender (integer), age (float), and fare price (float). Travelling class has 3 categories (1, 2 and 3). Sorry, I'm not sure how I could give you the dataset. So I would have to create a list for the variables and then define travelling class with the three categories? Thank you :) –  Aug 30 '16 at 00:15
  • This is what you'd have to do: if you have lets say 3 people in class 1, 1 person in class 2 and 2 people in class 3, then this is what your list would look like (the order doesn't matter): [1, 1, 1, 2, 3, 3]. You can do the same thing for gender, age and fare price. – TrakJohnson Aug 30 '16 at 09:26