Unique elements in columns in csv file using python

Question

I have a semicolon separated csv file which has the following form:

indx1; string1; char1; entry1 
indx2; string1; char2; entry2 
indx3; string2; char2; entry3 
indx4; string1; char1; entry4 
indx5; string3; char2; entry5

I want to get unique entries of the 1st and 2nd columns of this file in the form of a list (without using pandas or numpy). In particular these are the lists that I desire:

[string1, string2, string3] 
[char1, char2]

The order doesn't matter, and I would like the operation to be fast.

Presently, I am reading the file (say 'data.csv') using the command

with open('data.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=';')

I am using python 2.7. What is the fastest way to achieve the functionality that I desire? I will appreciate any help.

Do you want unique combinations of `(col1, col2)` or all unique `col1` and all unique `col2` values? — nbwoodward, Oct 29 '18 at 14:54
Possible duplicate of [How to create a list in Python with the unique values of a CSV file?](https://stackoverflow.com/questions/24441606/how-to-create-a-list-in-python-with-the-unique-values-of-a-csv-file) — jtweeder, Oct 29 '18 at 15:14

Eugene Yarmash · Accepted Answer · 2018-10-29T15:53:45.907

3

You could use sets to keep track of the already seen values in the needed columns. Since you say that the order doesn't matter, you could just convert the sets to lists after processing all rows:

import csv

col1, col2 = set(), set()

with open('data.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=';', skipinitialspace=True)        
    for row in csv_reader:
        col1.add(row[1])
        col2.add(row[2])

print list(col1), list(col2)  # ['string1', 'string3', 'string2'] ['char2', 'char1']

edited Oct 29 '18 at 15:53

answered Oct 29 '18 at 14:52

Eugene Yarmash

142,882
41
325
378

1

Thanks Eugene, this was really helpful! Your solution worked. =) – Ji Won Song Oct 29 '18 at 15:47
Can we actually print them in order @Eugene Yarmash ? – asha Jul 16 '20 at 09:47
@AlbionShala What do you mean by "print in order" ? – Eugene Yarmash Jul 16 '20 at 10:41
@EugeneYarmash I mean to print like `string1` `string2` `string3` like the way they are in the CSV – asha Jul 17 '20 at 08:10

score 2 · Answer 2 · answered Oct 29 '18 at 14:54

2

This should work. You can use it as benchmark.

myDict1 = {}
myDict2 = {}
with open('data.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=';')
    for row in csv_reader:
        myDict1[row[1]] = 0
        myDict2[row[2]] = 0

x = myDict1.keys() 
y = myDict2.keys()

answered Oct 29 '18 at 14:54

jimifiki

5,377
2
34
60

Thanks jimifiki, your solution was very helpful. It worked. =) – Ji Won Song Oct 29 '18 at 15:47
hi @jimifiki, I am getting the output like `dict_keys(['blla1','blla2'])` is there any way of printing only the keys without the `dict_keys` so to print only `['blla1','blla2']` – asha Jul 16 '20 at 09:46
sure @AlbionShala `list(myDict.keys())` constructs a list out of the dict_keys. So I would write `print(list(myDict.keys()))`, this should be fine. Have fun with Python's data structures ;-) – jimifiki Jul 16 '20 at 16:42
Actually in that way it prints all of them, so there are not only unique keys. What I did was just to iterate through `x` in your previous example, so `for y in x` ... `print(y)`. Thanks – asha Jul 17 '20 at 08:02

Unique elements in columns in csv file using python

2 Answers2