0

The problem: My program reads a CSV file and produces a nested list (because there are no line breaks) and I need to 'repair' the list afterwards before I can go further. The code works, but I am struggling to find a more efficient way and would be interested in any suggestions.

The details:

My program reads a CSV file which has the following format:

hakcke39475728,fjfjalcl689920,vjgjvkv848291, ...

So each item contains of letters and numbers, with a comma as the delimiter and no new lines inbetween. I use csv to read the file and put the result into a list:

import csv

result = []
with open("input.csv", "r", newline="") as f:
    reader = csv.reader(f, delimiter=",", quotechar='"')
    result = list(reader)

As there are no line breaks, the result is a nested list in the following format:

[['hakcke39475728', 'fjfjalcl689920', 'vjgjvkv848291', '...'], []]

After this, I need to "clean up" and perform an extra step - a for loop - to unnest the list:

output_final = [] 
for item in result[0]:
    output_final.append(item)

To finally get the output I need:

['hakcke39475728', 'fjfjalcl689920', 'vjgjvkv848291', '...']

What would be a more efficient way?

I couldn't figure out how to read the CSV in a different way so that it does not result in a nested list. AFAIK there is no way to set a comma as the EOL character (which would resolve my problem here, as I do not have line endings inbetween my values in the input.

Possibly related questions:

  • I found this question, but it's about writing the CSV differently, not reading it and not possible in my case.
  • This one is asking for a nested list instead of wanting to get rid of it, but there's no way to reverse-engineer the solution.
ladyfrauke
  • 39
  • 5
  • Couldn't you simply access your results with list(reader)[0]? – Jason Chia May 13 '20 at 07:00
  • Yes, thanks. That's what I am doing right now. I added the code snippet of my repair, which is exactly this. I am just wondering whether this actually is the most efficient way to do it, or if I am missing something here. – ladyfrauke May 13 '20 at 07:02
  • 2
    It seems like you have a file of single entries separated by commas, so `open("input.csv", "r", newline="").split(',')` might be enough. – snakecharmerb May 13 '20 at 07:02
  • 1
    You dont actually need a for loop. simply output_final = list(reader)[0] will work. Also probably the change to the csv reader by snakecharmerb will work cleaner. – Jason Chia May 13 '20 at 07:04
  • JasonChia - yes, your suggestion works. Thanks. @snakecharmerb - I get an error when trying your suggestion: _AttributeError: '_io.TextIOWrapper' object has no attribute 'split'_ - will need to look into this as I couldn't figure out a fix right now. Any suggestions? – ladyfrauke May 13 '20 at 07:34
  • My bad, you need to rad the file `open("input.csv", "r", newline="").read().split(',')` – snakecharmerb May 13 '20 at 07:35
  • It still throws an exception, but the error text just says `__enter__`. However, see the answer below: using next(reader) resolved the issue! – ladyfrauke May 13 '20 at 07:50
  • One possibility (without CSV module) is to use `f.readline().strip().split(',')` if there is only one line and no formatting problems. – Aivar Paalberg May 13 '20 at 08:32

1 Answers1

1

After this, I need to "clean up" and perform an extra step - a for loop - to unnest the list:

There is no need to copy the items one by one into a new list.

output_final = result[0]

Alternatively / with more context, under the assumption that your CSV file really only ever will contain one line of data:

import csv

with open("input.csv", "r", encoding="utf8", newline="") as f:
    reader = csv.reader(f, delimiter=",", quotechar='"')
    result = next(reader)

A csv reader is an iterator - it iterates over the rows in the CSV file. Normally you would use iterators in for loops:

    for row in reader:
        ...

The main difference to a list is that you can't directly access the elements of an iterator: reader[0] won't work. But the next() function retrieves the next element from an iterator every time you call it.

In this case, next(reader) is only called once and therefore you get the first "row" of your data.

A note about newline="". That does not mean that there are no newlines. It means that the csv module will handle newlines for you and automatically adapt to Windows-, Mac-, or *nix-style newlines. When dealing with the csv module, you should always open files with newline="" because of that.

Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • Thanks! Using next(reader) did the trick, it creates exactly the list I need. – ladyfrauke May 13 '20 at 07:48
  • 1
    @ladyfrauke Keep `next()` in mind for other cases when you deal with CSV data. For example you can use it to skip the header row in a CSV reader - `next(reader)` advances the reader by one, and then you can call `for row in reader:` to go over the remainder of the data. – Tomalak May 13 '20 at 08:00