Python - Parsing Columns and Rows

Question

I am running into some trouble with parsing the contents of a text file into a 2D array/list. I cannot use built-in libraries, so have taken a different approach. This is what my text file looks like, followed by my code

1,0,4,3,6,7,4,8,3,2,1,0
2,3,6,3,2,1,7,4,3,1,1,0
5,2,1,3,4,6,4,8,9,5,2,1

def twoDArray():
    network = [[]]       

    filename = open('twoDArray.txt', 'r')
    for line in filename.readlines():
        col = line.split(line, ',')
        row = line.split(',')

    network.append(col,row)        

    print "Network = "
    print network        

if __name__ == "__main__":
    twoDArray()

I ran this code but got this error:

Traceback (most recent call last):
  File "2dArray.py", line 22, in <module>
    twoDArray()
  File "2dArray.py", line 8, in twoDArray
    col = line.split(line, ',')
TypeError: an integer is required

I am using the comma to separate both row and column as I am not sure how I would differentiate between the two - I am confused about why it is telling me that an integer is required when the file is made up of integers

score 4 · Answer 1 · answered Feb 18 '11 at 16:31

Well, I can explain the error. You're using str.split() and its usage pattern is:

str.split(separator, maxsplit)

You're using str.split(string, separator) and that isn't a valid call to split. Here is a direct link to the Python docs for this:

http://docs.python.org/library/stdtypes.html#str.split

Wesley · Accepted Answer · 2011-02-18T21:24:48.817

To directly answer your question, there is a problem with the following line:

col = line.split(line, ',')

If you check the documentation for str.split, you'll find the description to be as follows:

str.split([sep[, maxsplit]])
Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done (thus, the list will have at most maxsplit+1 elements). If maxsplit is not specified, then there is no limit on the number of splits (all possible splits are made).

This is not what you want. You are not trying to specify the number of splits you want to make.

Consider replacing your for loop and network.append with this:

for line in filename.readlines():
    # line is a string representing the values for this row
    row = line.split(',')
    # row is the list of numbers strings for this row, such as ['1', '0', '4', ...]
    cols = [int(x) for x in row]
    # cols is the list of numbers for this row, such as [1, 0, 4, ...]
    network.append(row)
    # Put this row into network, such that network is [[1, 0, 4, ...], [...], ...]

-1 `row` will actually be `['1', '0', '4', ..., '0\n']` ... not very useful. — John Machin, Feb 18 '11 at 18:52
You're right - I'd totally forgotten to do the integer conversion. Thanks pointing that out. — Wesley, Feb 18 '11 at 21:23

John Machin · Answer 3 · 2011-02-19T02:00:36.237

2

"""I cannot use built-in libraries""" -- do you really mean "cannot" as in you have tried to use the csv module and failed? If so, say so. Do you mean that "may not" as in you are forbidden to use a built-in module by the terms of your homework assignment? If so, say so.

Here is an answer that works. It doesn't leave a newline attached to the end of the last item in each row. It converts the numbers to int so that you can use them for whatever purpose you have. It fixes other errors that nobody else has mentioned.

def twoDArray():
    network = []       
    # filename = open('twoDArray.txt', 'r')
    # "filename" is a very weird name for a file HANDLE
    f = open('twoDArray.txt', 'r')
    # for line in filename.readlines():
    # readlines reads the whole file into memory at once.
    # That is quite unnecessary.
    for line in f: # just iterate over the file handle
        line = line.rstrip('\n') # remove the newline, if any
        # col = line.split(line, ',')
        # wrong args, as others have said.
        # In any case, only 1 split call is necessary 
        row = line.split(',')
        # now convert string to integer
        irow = [int(item) for item in row]
        # network.append(col,row)        
        # list.append expects only ONE arg
        # indentation was wrong; you need to do this once per line
        network.append(irow)

    print "Network = "
    print network        

if __name__ == "__main__":
    twoDArray()

edited Feb 19 '11 at 02:00

answered Feb 18 '11 at 18:32

John Machin

81,303
11
141
189

Integer conversion eats whitespace, so you can get away without `line.rsplit` – Wesley Feb 18 '11 at 21:26
@Wesley: Your suggestion is turning a side-effect of int() into a kludge. See http://stackoverflow.com/questions/1001601/why-are-there-extra-blank-lines-in-my-python-program-output/1001658#1001658 – John Machin Feb 18 '11 at 21:38
@John: `int(" 1 ")` is not a kludge. The docs explicitly say that the number can be embedded in whitespace http://docs.python.org/library/functions.html#int It works on all Python implementations I've tried and it is unlikely to change. – jfs Feb 19 '11 at 17:32
@J.F. Sebastian: The kludge consists of relying on int() to remove the unwanted `\n` from the last field. Read the link in my earlier comment. – John Machin Feb 19 '11 at 23:20
@John Machin: `int()` uses the standard `isspace()` from `ctype.h` therefore `'\n'` will be ignored as well. – jfs Feb 20 '11 at 01:22
@J.F. Sebastian: I'm well aware than int() ignores all leading and trailing whitespace. If it didn't I would have said that @Wesley's answer didn't work, and down voted it. My point is that using that behaviour is a kludge. I say again, read the link in my earliest comment. – John Machin Feb 20 '11 at 01:27
@John Machin: I've read the link before posting the first comment. My point is it is perfectly ok to use `int()` on strings with leading/trailing whitespace i.e., the `.rstrip()` is redundant here. Your comment about database field is irrelevant for the parsing of data from question http://stackoverflow.com/q/5043815/ – jfs Feb 20 '11 at 03:41

score 0 · Answer 4 · answered Feb 18 '11 at 16:31

0

Omg...

network = []
filename = open('twoDArray.txt', 'r')
for line in filename.readlines():
    network.append(line.split(','))

you take

[
[1,0,4,3,6,7,4,8,3,2,1,0],
[2,3,6,3,2,1,7,4,3,1,1,0],
[5,2,1,3,4,6,4,8,9,5,2,1]
]

or you neeed some other structure as output? Please add what do you need as output?

answered Feb 18 '11 at 16:31

seriyPS

6,817
2
25
16

Maybe somebody doesn't like that you use `.readlines()`, the misleading usage of `filename` name, and the absence of `int` conversion. – jfs Feb 19 '11 at 18:08

score 0 · Answer 5 · answered Feb 18 '11 at 22:52

class TwoDArray(object):
    @classmethod
    def fromFile(cls, fname, *args, **kwargs):
        splitOn = kwargs.pop('splitOn', None)
        mode    = kwargs.pop('mode',    'r')
        with open(fname, mode) as inf:
            return cls([line.strip('\r\n').split(splitOn) for line in inf], *args, **kwargs)

    def __init__(self, data=[[]], *args, **kwargs):
        dataType = kwargs.pop('dataType', lambda x:x)
        super(TwoDArray,self).__init__()
        self.data = [[dataType(i) for i in line] for line in data]

    def __str__(self, fmt=str, endrow='\n', endcol='\t'):
        return endrow.join(
            endcol.join(fmt(i) for i in row) for row in self.data
        )

def main():
    network = TwoDArray.fromFile('twodarray.txt', splitOn=',', dataType=int)

    print("Network =")
    print(network)

if __name__ == "__main__":
    main()

score 0 · Answer 6 · edited May 23 '17 at 12:15

The input format is simple, so the solution should be simple too:

network = [map(int, line.split(',')) for line in open(filename)]
print network

csv module doesn't provide an advantage in this case:

import csv
print [map(int, row) for row in csv.reader(open(filename, 'rb'))]

If you need float instead of int:

print list(csv.reader(open(filename, 'rb'), quoting=csv.QUOTE_NONNUMERIC))

If you are working with numpy arrays:

import numpy
print numpy.loadtxt(filename, dtype='i', delimiter=',')

See Why NumPy instead of Python lists?

All examples produce arrays equal to:

[[1 0 4 3 6 7 4 8 3 2 1 0]
 [2 3 6 3 2 1 7 4 3 1 1 0]
 [5 2 1 3 4 6 4 8 9 5 2 1]]

score 0 · Answer 7 · answered Nov 14 '11 at 22:13

Read the data from the file. Here's one way:

f = open('twoDArray.txt', 'r')
buffer = f.read()
f.close()

Parse the data into a table

table = [map(int, row.split(',')) for row in buffer.strip().split("\n")]
>>> print table
[[1, 0, 4, 3, 6, 7, 4, 8, 3, 2, 1, 0], [2, 3, 6, 3, 2, 1, 7, 4, 3, 1, 1, 0], [5, 2, 1, 3, 4, 6, 4, 8, 9, 5, 2, 1]]

Perhaps you want the transpose instead:

transpose = zip(*table)
>>> print transpose
[(1, 2, 5), (0, 3, 2), (4, 6, 1), (3, 3, 3), (6, 2, 4), (7, 1, 6), (4, 7, 4), (8, 4, 8), (3, 3, 9), (2, 1, 5), (1, 1, 2), (0, 0, 1)]

Python - Parsing Columns and Rows

7 Answers7