2

After writing a function to generate some data, I wanted to add the ability to save it. I initially started out with the following code which I ran with 'save=True':

[in]
import csv
... (main body of code - all this works fine)
if save is True:
    print("Saving...")
    with open('dataset.csv', 'a+') as f:
        lines = f.readlines()
        for line in lines:
            linesplit = line.split(",")
            name_in_dataset = linesplit[0] 
            ...

            (... some code for the actual saving process - irrelevant)
            
            print("Data added successfully")

[out]
Saving...

I know that the dataset file contains this name and should have saved here, so I was a little confused as to where it went wrong. I started to break down the code until I reached this:

[in]
if save is True:
    print("Saving...")
    with open('dataset.csv') as f:
        lines = f.readlines()
        print(lines)

[out]
Saving...
[]

Not really sure why it can't read the lines? I though I had used the same code previously to read the lines of this very file so I'm really confused about why it's not working now.

I've tried adding things to the code such as f.seek(0) but this has made no difference. I've also tried changing the open function to 'a' and 'r' but alas it can't read the lines. I've searched through so many posts about .readlines() and can't find anyone experiencing this :( I feel like I've just been at work for too long and have forgotten the basic fundamentals of Python coding!

Thanks in advance <3

EDIT: Using the suggestions in the comments I changed the code to:

with open('(file path)/dataset.csv', 'r') as f:
     f.seek(0)
     lines = csv.reader(f)
     print(lines)

and it returned:

Saving...
<csv.reader object at 0x7f01282c7f20>
  • Try to use an absolute path because relative paths may not point to the expected location. – Michael Butscher Jul 26 '23 at 10:00
  • Is there a reason you are using `readlines()` rather than `csv.reader()`? Your usage may improve significantly if not https://stackoverflow.com/questions/22136173/python-readlines-and-append-data-to-each-line-output-to-one-line – Stitt Jul 26 '23 at 10:02
  • 2
    "a+" will seek to the end of the file and if you try to read from that position, you will get nothing. Doubting you actually did try with just plain old "r" aka read-only from the start of the file .. – rasjani Jul 26 '23 at 10:03
  • 1
    `f.seek(0)` followed by `f.readlines()` works for me if the file is not empty. – Daraan Jul 26 '23 at 10:10
  • @rasjani why do you not think I tried that, if I said I tried that? – amy-elouise Jul 26 '23 at 10:20
  • Because just randomly trying open file on read-write and seeking to the end of the file does not raise confidence. – rasjani Jul 26 '23 at 10:38
  • @rasjani as I mentioned, I used `f.seek(0)` in the initial problem-solving stages and had not limited my open function to only `'a+'` - This is just what I wanted in the end. – amy-elouise Jul 26 '23 at 15:03

2 Answers2

1

I see a lot of people new to Python and CSVs trying to use filemode append, and they usually get themselves in some trouble because of it.

In general, I recommend reading the source CSV, modifying the rows, then writing the modified rows to another file. Once you've verified the validity of the new file, you can decide what to do with the old file.

For reading/writing CSV, I recommend using the csv module's reader and writer.

Given the CSV:

Col1,Col2
r1c1,r1c2
r2c1,r2c2
r3c1,r3c2

Use the csv.reader(some_file) function to create a row iterator for that file:

with open('input.csv',newline='',encoding='utf-8') as f:
    reader = csv.reader(f)

The local variable reader will yield completely decoded rows. A row can be returned one-at-a-time with next(reader):

next(reader)
# ['Col1', 'Col2']
next(reader)
# ['r1c1', 'r1c2']

A row returned by reader is just a list of strings.

The iterator can also be used in a for-loop, as the documentation shows us:

for row in reader:
    print(row)

# ['r2c1', 'r2c2']
# ['r3c1', 'r3c2']

Note that the reader continued reading from where it left off with the next() statements. Also, now the reader has been exhausted—there are no more rows to decode. Trying to read from it will throw the StopIteration exception:

next(reader)
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# StopIteration

To get all the rows and be able to loop over them any number of times, use list(reader) when creating the reader to convert the transient iterator into a permanent list of rows:

with open('input.csv',newline='',encoding='utf-8') as f:
    reader = csv.reader(f)
    header = next(reader)
    rows = list(reader)

That saves the first row to its own variable, header. The rest of the rows are added to list named rows. If a row is a list of strings, then the variable rows is a list of list of strings.

If you want to omit the header, call next(reader) by itself (with no left-hand assignment). The reader will dutifully return the header, but it'll just go in to the void.

Now you can do something with those rows:

for row in rows:
    name = row[0]
    # do something with name...
    name = name.lower()
    # before saving it back to the list
    row[0] = name

Finally, write the modified rows back to a CSV. For me, I will always create a new file:

  1. I don't destroy the original data (a real pain when getting the original back might mean any number of steps, asking someone nicely to please send it again, or not even being available).
  2. I can compare my handy work to the original to make sure I did the right things.
with open('output.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerow(header)
    writer.writerows(rows)

Once you're happy with output.csv, you can decide what to do with input.csv—leave it, trash it, overwrite it with output.csv (os.rename('output.csv', 'input.csv')).

Good luck. :)

Zach Young
  • 10,137
  • 4
  • 32
  • 53
1

Just in case anyone happens upon the same problem, here is what ended up working for me:

if save is True:
     with open('dataset.csv', newline='') as f:
          reader = csv.reader(f)
          header = next(reader)
          lines = list(reader)
          old_line = ()
          newlines = []
          for line in lines:
               if line[0] == name:
                    old_line = line
                    new_line = old_line + additional_data
                    line = new_line
                    newlines.append(line)
               else:
                    newlines.append(line)

      with open('output.csv', 'w') as g:
           writer = csv.writer(g)
           writer.writerow(header)
           writer.writerows(newlines)
           print("Data added successfully.")

Not only did this simplify the saving process as opposed to using the .readlines() method, it worked exactly as I wanted. Thanks all for your help :)