-1

I'm currently doing a project for class and I need a little advice/help. I have a csv file that I'm extracting data from. (I am not using the csv module because I'm not familiar with and the instructor warned us that it's complicated.) I've gotten the data into lists using a function I created. It works fine, if the values are just a string of numbers, but if there is a percent sign or 'N/A' in in the cell, then I get an error. Here is the code:

def get_values(file, index):

    '''(file object, int) -> list
    Return a list of states and corresponding values at a prticular index in file.'''

    values_list = []
    for i in range(6):
        file.readline()
    for line in file:
        line_list = line.split(',')
        values_list.append(line_list[index])
    values_list = [i.rstrip('%') for i in values_list]
    values_list = [float(i) for i in values_list]
    return values_list




while True:
    try:
        file_name = input('Enter in file name: ')
        input_file = open( file_name, 'r')
        break

    except IOError:
         print('File not found.')




heart_list = get_values(input_file, 1)

input_file.close()
input_file = input_file = open( 'riskfactors.csv', 'r')


HIV_list = get_values(input_file, 8)

input_file.close()

I would like to strip the %, but nothing I;ve trie has worked so far. Any suggestions?

Blender
  • 289,723
  • 53
  • 439
  • 496
user2188956
  • 11
  • 1
  • 1

1 Answers1

1

Without seeing a complete SSCCE with sample inputs, it's hard to be sure, but I'm willing to bet the problem is this:

values_list = [i.rstrip('%') for i in values_list]

That will strip any '%' characters off the end of each value, but it won't strip any '%' characters anywhere else. And in a typical CSV file, that isn't good enough.

My guess is that you have a line like this:

foo , 10% , bar

This will split into:

['foo ', ' 10% ', ' bar\n']

So, you add the ' 10% ' to values_list, and the rstrip line will do nothing, because it doesn't end with a '%', it ends with a ' '.

Or, alternatively, it may just be this:

foo,bar,10%

So you get this:

['foo', 'bar', '10%\n']

… which has the same problem.

If this (either version) is the problem, what you want to do is something like:

values_list = [i.strip().rstrip('%')` for i in values_list]

Meanwhile, you can make this a lot simpler by just getting rid of the list comprehension. Why try to fix every row after the fact, when you can fix the single values as you add them? For example:

for line in file:
    line_list = line.split(',')
    value = line_list[index]
    value = value.rstrip('%')
    value = float(value)
    values_list.append(value)
return values_list

And now, things are simple enough that you can merge multiple lines without making it less readable.


Of course you still need to deal with 'N/A'. The question is whether you want to treat that as 0.0, or None, or skip it over, or do something different, but whatever you decide, you might consider using try around the float instead of checking for 'N/A', to make your code more robust. For example:

value = value.rstrip('%')
try:
    value = float(value)
except ValueError as e:
    # maybe log the error, or log the error only if not N/A, or...
    pass # or values_list.append(0.0), or whatever
else:
    values_list.append(value)

By the way, dealing with this kind of stuff is exactly why you should use the csv module.

Here's how you use csv. Instead of this:

for line in file:
    line_list = line.split(',')

Just do this:

for line_list in csv.reader(file):

That's complicated?

And it takes care of all of the subtleties with stripping whitespace (and quoting and escaping and all kinds of other nonsense that you'll forget to test for).

In other words, most likely, if you'd used csv, besides saving one line of code, you wouldn't have had this problem in the first place—and the same would be true for 8 of the next 10 problems you're going to run into.

But if you're learning from an instructor who thinks csv is too complicated… well, it's a good thing you're motivated enough to try to figure things out for yourself and ask questions outside of class, so there's some hope…

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • I was wondering the same thing. I thought that this was a relevant question and I worked at this code over and over again before I asked for help. – user2188956 Mar 21 '13 at 12:52
  • This project if for a programming 101 class and I guess because the textbook doesn't go over the csv module that instructor felt like it would be too much work if we had to go and learn the csv module on our own. Thank you again @abarnert for giving me such a detailed response and I do see that the csv is easier to use then what I was attempting to do. – user2188956 Mar 21 '13 at 12:56
  • Can someone please explain why this question keeps getting downvoted? – user2188956 Mar 31 '13 at 13:09
  • I'm still curious why my answer was downvoted, but it only happened once, not repeatedly, so… whatever. As for your question, my guess is that it's getting downvoted because you haven't provided an [SSCCE](http://sscce.org), and/or because you said "nothing I've tried has worked so far" but never explained what you've tried. (Also, it doesn't seem to have gotten any new down votes; it just went from +1/-2 to +0/-2.) If you need help editing the question to make it better, read the link I provided, and the FAQ for this site, and try the chat room for further help. – abarnert Apr 01 '13 at 17:52