1

Hello there community!

I've been struggling with this function for a while and I cannot seem to make it work.

I need a function that reads a csv file and takes as arguments: the csv file, and a range of first to last line to be read. I've looked into many threads but none seem to work in my case.

My current function is based on the answer to this post: How to read specific lines of a large csv file

It looks like this:

def lines(file, first, last):
    lines_set = range(first, last, 1)
    result = []
    with open("./2020.csv", "r") as csvfile:
        for line_number, row in enumerate(csvfile):
            if line_number in lines_set:
                 result.append(extract_data(csvfile.readline())) #this extract_data is a previously created function that fetches specific columns from the csv
    return result

What's happening is it's skipping a row everytime, meaning that instead of reading, for example, from line 1 to 4, it reads these four lines: 1, 3, 5 and 7.

Adittional question: The csv file has a headline. How can I use the next method in order to never include the header?

Thank you very much for all your help!

margarita
  • 27
  • 3

1 Answers1

1

I recommend you use the CSV reader, it’ll save you from getting any incomplete row as a row can span many lines.

That said, your basic problem is manually iterating your reader inside the for-loop, which is already automatically iterating your reader:

import csv

lines_set = range(first, last, 1)
result = []
with open("./2020.csv", "r") as csv_file:
    reader = csv.reader(csv_file)
    next(reader)  # only if CSV has a header AND it should not be counted

    for line_number, row in enumerate(reader):
        if line_number in lines_set:
             result.append(extract_data(row))

return result

Also, keep in mind that enumerate() will start at 0 by default, give the start=1 option (or whatever start value you need) to start counting correctly. Then, range() has a non-inclusive end, so you might want end+1.

Or... do away with range() entirely since < and > are sufficient, and maybe clearer:

import csv

start = 3
end = 6

with open('input.csv') as file:
    reader = csv.reader(file)
    next(reader)  # Skip header row
    for i, row in enumerate(reader, start=1):
        if i < start or i > end:
            continue

        print(row)
Zach Young
  • 10,137
  • 4
  • 32
  • 53
  • Hello there! First of all, thank you for your help! Using your first solution might work, but now I'm getting an error with the extract_data function that was previously created. This is the function: def extract_data(column): x = column.split(sep=';') conj = x[5], x[7], x[12] #the columns I want to bring from the csv return list(conj) The error I'm getting is on the x = column.split(sep=';') line, and it's the following: 'list' object has no attribute 'split' Do you know how I can fix this? Thank you once again! – margarita Jan 05 '22 at 21:51
  • 1
    Hi, and you're welcome! I don't know where `column` came from. I suggest you run my code unmodified and insert some `print()` statements to see what's going on before you start mixing your code back in. At the very least, look at the error and make sure you understand why you're getting `'list' object has no attribute 'split'`. – Zach Young Jan 05 '22 at 22:05
  • 1
    At top of the for-loop, `print(row)` to see exactly what you're passing to `extract_data()`... now that you're using a CSV reader, `row` is a different type/data structre. – Zach Young Jan 05 '22 at 22:08