Iterate over a list that contain duplicate elements

Question

I'm trying to iterate a list that contains some duplicate elements. I'm using the amount of duplicates so I don't want to put the list in a set before I iterate over the list.

I'm trying to count how many times the element appears and then write the element (the name) and the count of how many times it appears.

The issue I am running into is that in my output CSV file, there are as many rows as there are times the element appears. I am writing the CSV to an HTML table after its completed so I want it to be deduplicated.

My end goal is to have it count how many times the name appears, then write a row to the CSV file that contains the name and the count, then move to the next name in the list.

I tried searching and came across itertools.groupby but I'm not sure if that's going to be useful in this instance and if it is, how to use it correctly.

Thanks for the help.

EDIT: I forgot to mention - Python 2.6

with open(sys.argv[1]) as infile:
    rdr = csv.DictReader(infile, dialect='excel')
    qualsin = []
    headers = ['Qualifier Name','Appointments']
    for row in rdr:
        row['Qualifier Name'] = row['Qualifier Name'].upper()
        qualsin.append(row['Qualifier Name'])
    qualsin.sort()
    #total = 0
    with open('tempwork.csv', 'w') as tempwork:
        wrtr = csv.writer(tempwork, dialect='excel')
        wrtr.writerow(headers)
        for quals in qualsin:
            d = [quals, qualsin.count(quals)]
            #a = dict((key, value) for (key, value) in d)
            #total += qualsin.count(quals)
            wrtr.writerow(d)

Have a look at http://docs.python.org/2/library/collections.html#counter-objects — sberry, May 30 '13 at 19:57

sberry · Accepted Answer · 2013-05-30T20:16:05.087

You can depup in a set of another name, then use the original list to do the counting.

For instance, given qualsin = [0, 2, 3, 2, 3, 1, 2, 3, 5, 3, 3, 2, 4]:

set_quals = set(qualsin) # This is set([0, 1, 2, 3, 4, 5])
for quals in set_quals: # Iterate over the values in the set, not the list
    d = [quals, qualsin.count(quals) # count the values from the list, not the set
    wrtr.writerow(d)

Or...

import collections

...
set_quals = set(qualsin) # This is set([0, 1, 2, 3, 4, 5])
counts = collections.Counter(qualsin) # This is Counter({3: 5, 2: 4, 0: 1, 1: 1, 4: 1, 5: 1}) which acts like a dictionary
for quals in set_quals:
    d = [quals, counts[quals]] # use the name from the set and the value from the Counter
    wrtr.writerow(d)

EDIT
Because of your update of using Python2.6, Counter is not available. However, the first solution will still work.

You could make a Counter yourself by just doing:

counts = collections.defaultdict(int) # Available since 2.5
for quals in qualsin:
    counts[quals] += 1

Using the counter (either in 2.7 or homegrown like above) will reduce the time complexity by a factor of N if I am not mistaken. list.count is O(N) and you are doing that in a loop so you get O(N^2). The single iteration to create the counter is just O(N), so for larger lists this could be a big help.

EDIT 2

To get the output in sorted alphabetical order, all you do is convert the de-duped list (set) back into a sorted list.

ordered_deduped_quals = sorted(set(qualsin))
for quals in ordered_deduped_quals:
    ...

Thank you so much! I'm going to give it a shot now using your first example. — novafluxx, May 30 '13 at 20:05
I REALLY appreciate the help! Wish I could up vote the answer. This is EXACTLY what I needed. Only thing left is to sort it before writting it. I understand a set it not sorted. Do you know if I could put it back into a list and sort it before I write it? — novafluxx, May 30 '13 at 20:10
Alphabetical order (its a list of names for reporting purposes) — novafluxx, May 30 '13 at 20:13

Iterate over a list that contain duplicate elements

1 Answers1