Delete Duplicate records in CSV file in python 2.7

Question

My INPUT file:

1,boss,30
2,go,35
2,nan,45
3,fog,33
4,kd,55
4,gh,56

Output file should be:

1,boss,30
3,fog,33

Means my output file should be free from duplicates. I should delete the record which is repeating based on the column 1.

Code I tried:

source_rd = csv.writer(open("Non_duplicate_source.csv", "wb"),delimiter=d)
gok = set()
for rowdups in sort_src:
    if rowdups[0] not in gok:
        source_rd.writerow(rowdups)
        gok.add( rowdups[0])

Output I got:

1,boss,30
2,go,35
3,fog,33
4,kd,55

What am I doing wrong?

For starters, take a look at [How do I format my posts using Markdown or HTML?](http://stackoverflow.com/help/formatting); I'll edit it for you this time so you can see how it works. — Tim Pietzcker, Sep 23 '14 at 14:14
What is `sort_src`? Also, could you clarify why you didn't expect that output; the duplicates have been removed as required. — jonrsharpe, Sep 23 '14 at 14:21

dawg · Accepted Answer · 2014-09-23T14:42:07.913

You can just loop the file twice.

The first time through, count all the duplicates. Second time through fetch the ones of interest.

import csv

gok={}
with open(fn) as fin:
    reader=csv.reader(fin)
    for e in reader:
        gok[e[0]]=gok.setdefault(e[0], 0)+1

with open(fn) as fin:
    reader=csv.reader(fin)
    for e in reader:
        if gok[e[0]]==1:
            print e

Prints:

['1', 'boss', '30']
['3', 'fog', '33']

The reason your method does not work is that once the second instance of the duplicate is seen, the first has already been written.

Delete Duplicate records in CSV file in python 2.7

My INPUT file:

Output file should be:

Code I tried:

Output I got:

1 Answers1