2

I'm trying to write a program and I'm having a lot of trouble with it. Here are my instructions: For this program you are going to create a simple database from some U.S. Census data. The database will consist of a dictionary whose keys are the state names and whose values are a list of the populations in each of the years from 1900 to 1990. Once you have created the database, you will write a simple command driven program that will prompt a user for a state name and a year and then report out the population for that year in that state. Your program will do this until the user types any word beginning with a 'q' or 'Q'.

Census data is here: http://www.census.gov/population/www/censusdata/files/urpop0090.txt I have that all saved to a flat ascii file named "database"

Take some time to study the file. It contains some superfluous information (at least for our purposes). You will need to develop a strategy to extract precisely the information you need from the file to put into your database (dictionary).

Here are my patterns to describe the information I need:

  1. You can tell you have a line with state data on it when the line starts with 6 spaces and is followed by an upper-case letter. You can find the end of the state name when there are two spaces in a row later in that line.

  2. If you have a line that contains state data, you can find the first total population on that line by going to character 43 and then backing up until you find a single space.

  3. If you have a line that contains state data, you can find the second total population on that line by going to character 101 and then backing up until you find a single space.

  4. If you have a line that contains state data, you can find the third total population on that line by going to character 159 and then backing up until you find a single space.

This is what I have so far:

#gets rid of commas in the populations 
def convert_string_to_number( comma_string ):
        number = comma_string.replace(",","")
        parts = number.split(".")  # check for a decimal point
        if len(parts) == 1 and parts[0].isdigit(): # we really have an integer
    number = float(parts[0])
        elif len(parts) == 2 and parts[0].isdigit() and parts[1].isdigit(): #float
    number = float (parts[0] + "." + parts[1])
        else:
    number = None
        return number


def getsub(str, endindex):
     sublist = str[:endindex].split(' ')
     substring = sublist[-1]
     return substring

def main():
    data = open('database', 'r')
lines = data.readlines()

for line in lines:
    # Now do the line processing.
    if line.startswith('      '):
    # Now process the state data
        firsttotalpop = getsub(line, 42)
        secondtotalpop = getsub(line, 100)
        thirdtotalpop = getsub(line, 158)


return 0

I'm having some trouble figuring out how to actually create a dictionary with keys/values, and how to get the population values to stick to the keys of the state names. Also, I'm not positive how to take a user input and use that as a key. I'm also not sure if the code that is up there properly gets the State Name and Population information.

Any suggestions/help would be greatly appreciated!

nkjt
  • 7,825
  • 9
  • 22
  • 28

3 Answers3

1

To create a dict you'd do something like this:

censusvalues = {}
censusvalues['CA'] = {}
censusvalues['CA']['1960'] = <1960 census value>

you can populate the dict like that based on the data you extract:

censusvalues['CA'] = {}
censusvalues['CA']['1960'] = 456
censusvalues['CA']['1970'] = 789
>>censusvalues
>>{'CA': {'1960': 456, '1970': 789}}

the prompt will prompt the user for state name and year:

state = raw_input("Enter the state: ")
year = raw_input("Enter the year: ")

and then will do something like :

 censusvalues[name][year] 

to print out the output.

I'm going to address a few issues I see in your code here (be sure to import re at the beginning after these edits):

def main():
    data = open('database', 'r')
    lines = data.readlines()
    year = 0
    censusvalues = {}
    for line in lines:
        # Now do the line processing.
        # The first thing you need to do here is see which years 
        # you are about to grab data from.  To do this, you need to figure out
        # how to extract that from the file.  Every line that has a year in it is prefixed by the same number of spaces followed by a number, so you can get it that way:
        if re.match('<insert number of spaces here...too lazy to count>[0-9]', line):
            year = int(line[<number of spaces>:].strip())
            continue

        if line.startswith('      '):

        # Now process the state data
        <you need to insert code here to grab the state name>

            firsttotalpop = getsub(line, 42)
            secondtotalpop = getsub(line, 100)
            thirdtotalpop = getsub(line, 158)
            censusvalues[state][year] = firsttoalpop
            censusvalues[state][year-10] = secondtotalpop 
            censusvalues[state][year-20] = thirdtotalpop 
    return 0

Finally, you need to account for what happens when you only have one year present in a line and not 3. I'll leave that as an excercise for you...

EDIT: One more thing, you also need to check for presence of the dict before you try to add K/V pairs to it...like this maybe:

if not <state> in censusvalues:
    censusvalues[<state>] = {}
deweyredman
  • 1,440
  • 1
  • 9
  • 12
  • well, the point is you can do that programmatically...for example you only need to create the censusvalue dict once, and the censusvalues['CA'] dict once. After you create the empty dict for each state you can just assign key value pairs like I did above, but you'd wrap it in some for loop that reads your data. In addition, to get the user input you'd do something like `state = raw_input("Enter the state: ") year = raw_input("Enter the year: ")` – deweyredman Mar 06 '14 at 21:42
  • you can make a function that will do that programmatically is my point. I'll edit your code to show you what I mean. – deweyredman Mar 06 '14 at 21:49
  • I answered this above in nwalshes' answer. – deweyredman Mar 06 '14 at 22:43
0

As far as creating the dict:

my_dict = {}
my_dict['Texas'] = [1,2,5,10,2000] #etc etc 
my_dict['Florida'] = [2,3,6 10, 1000] #etc etc

and you can do this too,

temp = 'Florida'
print my_dict[temp]

you can store your data however you want but the general syntax is dict[key] = value where key can be an int or string (string in your case) and value can be pretty much any data structure (list, int, string, list of ints, even another dict, or a list of dicts.. you get the picture)

drez90
  • 814
  • 5
  • 17
  • @thunder1417 You make a program so you DON'T have to do it for every state and population. Make a string to get the state name and make integers to get the population. Then make a dictionary keyed with the state name and set it to the population. You need to place the updates to the dictionary in your loop where you read state names and populations. – nwalsh Mar 06 '14 at 21:45
  • @thunder1417 I think you are thinking too hard about your patterns, but you get the general concept and are making them more difficult than it has to be. For example, take the get 1st population. You know that it will start at character 34 by looking at the text file. You know that it will end at 44. Therefore, the population is the string between 34 and 44. Now just remove the commas by using a string method. This is extremely less complex than start at the back and move forward and replace commas. – nwalsh Mar 06 '14 at 22:12
  • @thunder1417 I will post an answer to explain it. – nwalsh Mar 06 '14 at 22:17
0

Given: we know that population 1 starts on character 34 because there is no state that has over 100 million people. We know that population 1 will end on character 44.

However, there are states that have LESS than ten million people and therefore they must start at character 35 or 36. Does this matter? No.

# where line is the line is containing STATE information
def get_population_one( line ):
    populationOne = line[34:44]
    populationOne = populationOne.replace(',','') # remove the commas
    populationOne = populationOne.replace(' ', '') # remove any spaces for states that start with less than 10 million population
    return int(populationOne) # convert the string to an integer 

Then for population two and population three you must merely change the index of the state information and use the same logic above.

this can all be accomplished in one line:

 def get_population_one(line):
     return int(line[34:44].replace(',', '').strip())
deweyredman
  • 1,440
  • 1
  • 9
  • 12
nwalsh
  • 471
  • 2
  • 9
  • I can't seem to format the code correctly so i'm going to post my one liner as an edit to your code. – deweyredman Mar 06 '14 at 22:29
  • To use the state names as keys, you'll need to grab them from the file. Since you know that each line with a state name starts with 6 spaces, you'd do something like this on that line: `state = line[6:].split(' ')[0]` – deweyredman Mar 06 '14 at 22:39
  • @deweyredman That doesn't work for states with spaces in their name. Like South Dakota. But merely adding another space to the split will fix that problem. – nwalsh Mar 06 '14 at 22:42
  • Good catch. I need to get back to work, but I think you guys have enough to go on. – deweyredman Mar 06 '14 at 22:46
  • Something very similar to what we did for the populations should work (grab the text from character 6 to...whatever..., then strip whitespace from it) – deweyredman Mar 06 '14 at 22:47