1

I made this python code for testing purposes. The idea behind it is to read the rows of a csv file, assign data to variables and print those varibales using threads. However, I would like the function to read the csv file row by row for each thread but I didn't find how to do it...

The csv file looks like this :

    name        lastname
1   John        B.
2   Alex        G.
3   Myriam      R.
4   Paul        V.
5   Julia       L.
6   Margot      M.

Here's the code :

import csv
import threading

class main():

    def print_names():

        with open('names.csv', 'r') as csv_file:
            csv_reader = csv.DictReader(csv_file)
            for row in csv_reader:
                data = name, lastname = row['name'], row['lastname']

        screenlock.acquire()
        print(name, lastname)
        screenlock.release()


if __name__ == '__main__':

    screenlock = threading.BoundedSemaphore(1)

    with open('names.csv', 'r') as csv_file:
        csv_reader = csv.DictReader(csv_file)
        rows_count = len(list(csv_reader))

    threads = []
    for _ in range(rows_count):
        t = threading.Thread(target=main.print_names)
        threads.append(t)
        t.start()
    for thread in threads:
        thread.join()

The current output is :

Margot M.
Margot M.
Margot M.
Margot M.
Margot M.
Margot M.

But I would like to get this output :

John B.
Alex G.
Myriam R.
Paul V.
Julia L.
Margot M.

PS : Sorry if my question isn't really understandable, its my first post here.

  • Have you read that? https://docs.python.org/3/library/threading.html#threading.Thread Look at this question also: https://stackoverflow.com/questions/6904487/how-to-pass-a-variable-by-name-to-a-thread-in-python and directly pass your line to your thread. That would avoid opening the file more than once. – Maël Pedretti Apr 26 '21 at 13:47
  • 1
    Any particular reason why you want this threaded? It will only slow-down your processing compared to a single-threaded execution... – zwer Apr 26 '21 at 14:02
  • @zwer I would like to work with all the rows data at the same time, not one by one so using threads seems to be the best option for me. – quennsbernard Apr 26 '21 at 15:26
  • @quennsbernard - Python threading, with the exception of some I/O operations, works in cooperative multitasking mode due to [the dreaded GIL](https://wiki.python.org/moin/GlobalInterpreterLock). Unless there is a lot of waiting for I/O ops in your code, multithreading will only make it slower. If your intention is to process multiple things in parallel (provided you have enough CPU cores), you should look at [`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html) instead. – zwer Apr 26 '21 at 15:55

2 Answers2

1

The problem seems to be that you are reading the whole file in every thread. Then, at the end of

for row in csv_reader:
    data = name, lastname = row['name'], row['lastname']

only the last (name, lastname) are stored. I'm not familiar with the csv module, but i assume it returns a generator. If that's the case, you can either share the generator so that each thread read from it once or you can give each thread it's thread index so it can ignore all the remaining lines.

Option 1: This will not always conserve the file order, but if you are using threads to work with each line i suppose the order is not important... Threads may not be the best solution if order is important.

def print_names(csv_reader):
    row = next(csv_reader)
    data = name, lastname = row['name'], row['lastname']

    screenlock.acquire()
    print(name, lastname)
    screenlock.release()

if __name__ == '__main__':

    screenlock = threading.BoundedSemaphore(1)

    with open('names.csv', 'r') as csv_file:
        csv_reader = csv.DictReader(csv_file)
        rows_count = len(list(csv_reader))

    with open('names.csv', 'r') as csv_file:
        csv_reader = csv.DictReader(csv_file)

        threads = []
        for _ in range(rows_count):
            t = threading.Thread(target=print_names, args=(csv_reader,))
            threads.append(t)
            t.start()
        for thread in threads:
            thread.join()

Option 2: This does not ensure order consistency either... and you are reading the file one time for each thread.. This could be a huge bottleneck for large files.

def print_names(idx):
    with open('names.csv', 'r') as csv_file:
        csv_reader = csv.DictReader(csv_file)
        row = [r for i,r in enumerate(csv_reader) if i == idx][0]
        data = name, lastname = row['name'], row['lastname']

    screenlock.acquire()
    print(name, lastname)
    screenlock.release()

if __name__ == '__main__':

    screenlock = threading.BoundedSemaphore(1)

    with open('names.csv', 'r') as csv_file:
        csv_reader = csv.DictReader(csv_file)
        rows_count = len(list(csv_reader))

    threads = []
    for idx in range(rows_count):
        t = threading.Thread(target=print_names, args=(idx,))
        threads.append(t)
        t.start()
    for thread in threads:
        thread.join()
vLabayen
  • 131
  • 5
  • thank you so much, as the order isn't really important, the Option 1 was exactly what I was looking for ! – quennsbernard Apr 26 '21 at 15:15
  • also, I would like the function to count up from 0 to print something like [ Task 1 ] - John B, [ Task 2] - Alex G, [ Task 2] - Myriam R, etc... how could I implement this ? – quennsbernard Apr 26 '21 at 15:19
  • 1
    you might want to put a `threading.Lock` around the `next(csv_reader)` call. I don't think it's guaranteed to be atomic. – Aaron Apr 26 '21 at 17:08
  • Better yet, you could iterate over the reader object, and pass `row` to each thread in `args=(row,)` – Aaron Apr 26 '21 at 17:24
  • You can also add the idx from the second option to the thread, in order to have a numeric index for printing. `args=(csv_reader, idx)` and `def print_names(csv_reader, idx):`. Anyway, the best way would be what other people like @Aaron is telling, read the lines in the main thread and pass them (with the index if you want) to the threads. That way you can also avoid reading the whole file for counting lines. Something like `with open(...): csv_reader = ...; for idx, line in enumerate(csv_reader): t = threading.Thread(target=print_names, args=(line, idx))` – vLabayen Apr 27 '21 at 09:32
0

Here's exactly what's going on with your code:

  1. Read the entire file line by line to count the number of lines, and close it.

  2. Create and start as many threads as there are lines in the file each doing the following:

    a. Open the same file (each thread has it's own separate copy of the file to read)

    b. Read the entire file line-by-line.

    c. Assign data = name, lastname = row['name'], row['lastname'] for each row

    d. Close the file

    e. Print the value of name and lastname from the last loop iteration (last row of the file)

  3. Wait for all the threads to complete

Each thread will read the same line as the last iteration of that for loop, so name and lastname will naturally be the same in each thread.

Reading files is generally best left to a single thread, as normal files are not meant for random access. If you need to do a significant amount of processing for each line, you would benefit most from reading the file in the main thread, and passing each line to your threads to be processed.

Aaron
  • 10,133
  • 1
  • 24
  • 40