0

I am learning python and I have been stuck trying to figure why this script won't work.

I have a csv file with a header, and I input it into terminal as the an argument

The following script works fine, it lets me reiterate through each line of my csv file

import sys

input = open(sys.argv[1], 'r')
for line in input:
    print(line)

But when I try to convert my column index and headers into a dictionary

import sys
import pandas as pd

input = open(sys.argv[1], 'r')

csvfile = pd.read_csv(input)
columnheader_dict= {csvfile.columns.get_loc(i):i for i in csvfile.columns}

for line in input:
    print(line)

print(line) doesnt print anything. Why won't it let me reiterate through each row in my csv file?

juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
cms72
  • 177
  • 10
  • Does this answer your question? [pandas.read\_csv from string or package data](https://stackoverflow.com/questions/20696479/pandas-read-csv-from-string-or-package-data) – tbhaxor Dec 17 '19 at 06:04
  • Hi @GurkiratSingh, Thanks for the reply. I was wondering why doesn't this line work anymore in the second script ```for line in input: print(line)```. I would have thought it should still work since I assigned filehandle to the variable ```input``` – cms72 Dec 17 '19 at 06:12
  • open() returns a generator object that is used up by pandas. – Drey Dec 17 '19 at 06:20
  • Because the reader is a iterator – juanpa.arrivillaga Dec 17 '19 at 06:21
  • Thanks Drey and @juanpa.arrivillaga for your tips! I guess more to be aware of when learning to code in python. Thank you. – cms72 Dec 17 '19 at 06:39

2 Answers2

1

When you use open in Python, what you're creating, the variable you've named input is called a file handle. This file object stores a cursor for where the file handler is currently pointed to in the file (this starts at 0, the beginning of the file).

When you call for line in input.readlines() what's actually happening in Python is that it's moving this cursor in the file forward by a line each iteration. Eventually this cursor reaches the end of the file and stays there. This is probably the same way that pd.read_csv(input) works, so by the time you've reached your following for line in input.readlines(), the cursor is already at the end of the file and there's nothing more to read.

If you wanted to modify your file so that the for line in input.readlines() section would work, you could tell the file object to input.seek(0) which moves the cursor back to the beginning of the file.

Max L
  • 126
  • 2
  • I see. Thank you for taking the time to explain this. Its been doing my head in trying to understand why nothing is printing. – cms72 Dec 17 '19 at 06:32
  • @cms72 You shouldn't use `for line in input.readlines()`, that is mainly kept around for backwards compatibility. Always iterate directly over the file object `for line in input:` – juanpa.arrivillaga Dec 19 '19 at 19:53
  • Thanks @juanpa.arrivillaga - whats the difference of having it with or without .readlines() method? What do you mean by backwards compatibility? – cms72 Dec 20 '19 at 04:09
  • @cms72 if you don't use `readlines()` then it only holds on line in memory at a time. With `readlines()` it stores the whole list in memory and then iterates over it. – juanpa.arrivillaga Dec 20 '19 at 05:13
  • I see. Thanks @juanpa.arrivillaga for the tips! – cms72 Dec 21 '19 at 05:06
0

To iterate over each line, you need to use the readlines iterator

import sys
input = open(sys.argv[1], 'r')  # FYI, input is a python keyword, do not recommend using it
for line in input.readlines():
    print(line)
at14
  • 1,194
  • 9
  • 16
  • 2
    You can loop over `input` just fine, but it stops working when you have already looped over all the lines in the file or otherwise consumed them. – tripleee Dec 17 '19 at 06:25
  • 1
    as suggested by @Max L you can set it back to the first line by doing input.seek(0) – at14 Dec 17 '19 at 06:26
  • Thanks @at14 and @tripleee for answering. I thought I could just use the same variable ```input``` since I assigned the output of pd.read_csv to a different variable ```input_csvfile```. I guess its something I have to be aware of coding in python. Thanks again! – cms72 Dec 17 '19 at 06:34