1

I am trying to read a csv file in python. The csv file has 1400 rows. I opened the csv file using the following command:

import csv  
import sys             
f=csv.reader(open("/Users/Brian/Desktop/timesheets_9_1to10_5small.csv","rU"),
    dialect=csv.excel_tab)

Then I tried to loop through the file to pull the first name from each row using the following commmands:

for row in f:
    g=row
    s=g[0]  
    end_of_first_name=s.find(",")
    first_name=s[0:end_of_first_name]

I got the following error message:

Traceback (most recent call last):
File "", line 3, in module
s=g[0]
IndexError: list index out of range

Does anyone know why I would get this error message and how I can correct it?

cxw
  • 16,685
  • 2
  • 45
  • 81
  • 3
    Did you try adding `print row` inside the loop to see what it thinks the rows are? One of them (maybe at the end) is empty. Incidentally, I don't understand your `end_of_first_name` logic (unless, as it's just occurred to me, there are multiple names there, and by "First" you don't mean "John" in "John Smith", you mean "John" in "John, Fred".) – DSM Oct 14 '12 at 19:38
  • 2
    I bet if you did len(g) it would return 0. It sounds like you have an empty row. – ajon Oct 14 '12 at 19:40
  • good idea. I added a print row and realized that the code is having trouble with a row deep in the csv file. I will try to find out why that row is troublesome. Thank you! – user1744871 Oct 15 '12 at 02:06

2 Answers2

3

You should not open the file in universal newline mode (U). Open the file in binary mode instead:

f=csv.reader(open("/Users/Brian/Desktop/timesheets_9_1to10_5small.csv","rb"),
    dialect=csv.excel_tab)

CSV does it's own newline handling, including managing newlines in quotes.

Next, print your rows with print repr(row) to verify that you are getting the output you are expecting. Using repr instead of the regular string representation shows you much more about the type of objects you are handling, highlighting such differences as strings versus integers ('1' vs. 1).

Thirdly, if you want to select part of a string up to a delimiter such as a comma, use .split(delimiter, 1) or .partition(delimiter)[0]:

>>> 'John,Jack,Jill'.partition(',')[0]
'John'
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Thanks for responding so quickly. I tried to open it in binary mode (rb) and got the following error message Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode? I used the print repr(row) technique and realized that the code is having trouble on a row deep in the csv file. I will try to find out why that row is troublesome. thank you! – user1744871 Oct 15 '12 at 02:02
  • @user1744871: Right, there appears to be a bug in the csv module; see [Python and csv help](http://stackoverflow.com/q/2930673) Glad my `repr()` trick helped you. – Martijn Pieters Oct 15 '12 at 07:25
0

row and g point to an empty list. I don't know if that necessarily means that it is empty line in the file as csv may have other issues with it.

line_counter = 0
for row in f:
    line_counter = line_counter + 1
    g=row
    if len(g) == 0:
        print "line",line_counter,"may be empty or malformed"
        continue

Or, as Martijn points out, the Pythonic way is using enumerate:

for line_counter, row in enumerate(f,start=1):
    g=row
    if len(g) == 0:
        print "line",line_counter,"may be empty or malformed"
        continue
Scooter
  • 6,802
  • 8
  • 41
  • 64