0

Trying to plot multiple lines on one graph using matplotlib and for loops, but the code doesn't work after the first iteration. Here's the code:

import csv
import matplotlib.pyplot as plt
r = csv.reader(open('CrimeStatebyState.csv', 'rb'))
line1 = r.next()

def crime_rate(*state):
    for s in state:
        orig_dict = {}
        for n in range (1960,2006):
            orig_dict[n] = []
        for line in r:
            if line[0] == s:
                orig_dict[int(line[3])].append(int(line[4]))
        for y in orig_dict:
            orig_dict[y] = sum(orig_dict[y])
        plt.plot(orig_dict.keys(), orig_dict.values(),'r')
        print orig_dict.values()
        print s

crime_rate("Alabama", "California", "New York")

Here's what it returns:

[39920, 38105, 41112, 44636, 53550, 55131, 61838, 65527, 71285, 75090, 85399, 86919, 84047, 91389, 107314, 125497, 139573, 136995, 147389, 159950, 190511, 191834, 182701, 162361, 155691, 158513, 173807, 181751, 188261, 190573, 198604, 219400, 217889, 204274, 206859, 206188, 205962, 211188, 200065, 192819, 202159, 192835, 200331, 201572, 201664, 197071]
Alabama
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
California
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
New York
**[[[Graph of Alabama's values]]]**

Why am I getting zeroes after the loop runs once? Is this why the other two graphs aren't showing up? Is there an issue with the sum function, the "for line in r" loop, or using *state?

Sorry if that's not enough information! Thanks to those kind/knowledgeable enough for helping.

userNaN
  • 506
  • 1
  • 5
  • 13
  • I think the problem is the way the csvreader object works. The `for line in r` loop loops to the end of the file, so that each other state starts at the end of the file, hence you get all zeroes. I think the simplest solution would be to change the `state` loop. Rather than looping through the states you can check whether `line[0]` is in the list of state values you pass to the function, something like `if states in line[0]:...`. Hope that helps! – darthbith Sep 08 '13 at 15:04
  • 1
    Just so that you know the accepted solution transverses the whole file for each state, this is not efficient. – elyase Sep 08 '13 at 15:45

2 Answers2

0

It would appear that your csv reader is exhausted after you have processed the first state and therefore when you next call "for line in r:" on the next state there are no more lines to look at. You can confirm this by putting a print statement straight after it to see what it has to process e.g.

for line in r:
    print "test" # Test print
    if line[0] == s:
        orig_dict[int(line[3])].append(int(line[4]))

If you re-define your csv reader within each state loop you should get your data correctly processed:

import csv
import matplotlib.pyplot as plt


def crime_rate(*state):
    for s in state:
        r = csv.reader(open('CrimeStatebyState.csv', 'rb'))
        line1 = r.next()
        orig_dict = {}
        for n in range (1960,2006):
            orig_dict[n] = []
        for line in r:
            if line[0] == s:
                orig_dict[int(line[3])].append(int(line[4]))
        for y in orig_dict:
            orig_dict[y] = sum(orig_dict[y])
        plt.plot(orig_dict.keys(), orig_dict.values(),'r')
        print orig_dict.values()
        print s

crime_rate("Alabama", "California", "New York")
Adam
  • 24
  • 1
0

Others have already explained the source of your error. May I suggest you use pandas for this task:

import pandas as pd

states = ["Alabama", "California", "New York"]
data = pd.read_csv('CrimeStatebyState.csv')               # import data
df = data[(1996 <= data.Year) & (data.Year <= 2005)]      # filter by year
pd.pivot_table(df, rows='Year', cols='State', values='Count')[states].plot()

enter image description here

elyase
  • 39,479
  • 12
  • 112
  • 119