3

I have a dictionary with duplicate values.

Deca_dict = {
    "1": "2_506",
    "2": "2_506",
    "3": "2_506",
    "4": "2_600",
    "5": "2_600",
    "6": "1_650"
}

I have used collections.Counter to count how many of each there are.

decaAdd_occurrences = {'2_506':3, '2_600':2, '1_650':1}

I then created a new dictionary of values to be updated.

deca_double_dict = {key: value for key, value in Deca_dict.items()
                        if decaAdd_occurrences[value] > 1}
deca_double_dict = {
    "1": "2_506",
    "3": "2_506",
    "2": "2_506",
    "4": "2_600"
}

(in this case, it's the original dict without the last item)

I'm trying to figure out how to increment num, for the values of counter_dict minus 1. This will update all the values except one, which can stay the same. The goal output allows one of the duplicates to keep the same value, whereas the rest will have the first number of the value string incremented increasingly (based on the number of duplicated counted). I'm trying to achieve unique values for the data represented by the original Deca_dict.

Goal output = {'1':'3_506', '2':'4_506', '3':'2_506', '4':'3_600', '5':'2_600'}

I started going about things the following way, but ended up just incrementing all double items, resulting in what I had originally, except with values plus one. For context: The values of the original Deca_dict were found concatenating two numbers (deca_address_num and deca_num_route). Also, homesLayer is a QGIS vector layer where deca_address_num and deca_num_route are stocked in fields with indices d_address_idx and id_route_idx.

for key in deca_double_dict.keys():
    for home in homesLayer.getFeatures():
        if home.id() == key:
            deca_address_num = home.attributes()[d_address_idx]
            deca_num_route = home.attributes()[id_route_idx]
            deca_address_plus = deca_address_num + increment
            next_deca_address = (str(deca_address_plus) + '_' +
                                 str(deca_num_route))
            if not next_deca_address in Deca_dict.values():
                update_deca_dbl_dict[key] = next_deca_address

The result is useless:

Update_deca_dbl_dict = {
    "1": "3_506",
    "3": "3_506",
    "2": "3_506",
    "5": "3_600",
    "4": "3_600"
}

My second try attempts to include a counter, but things are in the wrong place.

for key, value in deca_double_dict.iteritems():
    iterations = decaAdd_occurrences[value] - 1
    for home in homesLayer.getFeatures():
        if home.id() == key:
            #deca_homeID_list.append(home.id())
            increment = 1
            deca_address_num = home.attributes()[d_address_idx]
            deca_num_route = home.attributes()[id_route_idx]
            deca_address_plus = deca_address_num + increment
            next_deca_address = (str(deca_address_plus) + '_' +
                                 str(deca_num_route))
            #print deca_num_route
            while iterations > 0:
                if not next_deca_address in Deca_dict.values():
                    update_deca_dbl_dict[key] = next_deca_address
                    iterations -= 1
                    increment += 1

UPDATE Even though one of the answers below works for incrementing all duplicate items of my dictionary, I am trying to re-work my code, as I need to have this comparison condition to the original data in order to increment. I still have the same result as my first try (the useless one).

for key, value in deca_double_dict.iteritems():
    for home in homesLayer.getFeatures():
        if home.id() == key:
            iterations = decaAdd_occurrences[value] - 1
            increment = 1
            while iterations > 0:
                deca_address_num = home.attributes()[d_address_idx]
                deca_num_route = home.attributes()[id_route_idx]
                deca_address_plus = deca_address_num + increment
                current_address = str(deca_address_num) + '_' + str(deca_num_route)
                next_deca_address = (str(deca_address_plus) + '_' +
                                 str(deca_num_route))
                if not next_deca_address in Deca_dict.values():
                    update_deca_dbl_dict[key] = next_deca_address
                    iterations -= 1
                    increment += 1
                else:
                    alpha_deca_dbl_dict[key] = current_address
                    iterations = 0
user25976
  • 1,005
  • 4
  • 18
  • 39
  • Can you explain a bit more about what transformation you're trying to achieve? I can't see how the desired output relates to the input. – maxymoo Jun 04 '15 at 00:06
  • @maxymoo i added more info on that. I see why you were confused too, there was a typo in there. – user25976 Jun 04 '15 at 00:17
  • You look the full string for duplicates or only the right side of the string - suppose you split in the underscore "_"? – eri0o Jun 04 '15 at 00:19
  • I guess that these `’` single quotes will give you an error: `SyntaxError: invalid character in identifier` – nbro Jun 04 '15 at 00:19
  • @Elric I look at the full string for duplicates, but I only increment the first number of the string. Originally, the string is a concatenation of two attribute values of the QGIS feature. For this reason, I iterate through the features, recuperating the feature whose ID is the dictionary key, and its original two attribute numbers. – user25976 Jun 04 '15 at 00:24
  • @Xenomorph Do you mean for the concatenation of next_deca_address? If you're referring to that--no, I did not have a syntax error. – user25976 Jun 04 '15 at 00:25
  • @user25976 Strange, usually you should just be able to use `'` or `"`. – nbro Jun 04 '15 at 00:28
  • @Xenomorph I did use ' . It did not give me an error. This is not the issue. – user25976 Jun 04 '15 at 00:30
  • @user25976 I know this is not your main issue, I just wanted to point out the problem... You did you `'`, but in your question your are not using it in many places... – nbro Jun 04 '15 at 00:31
  • I count how many "2_506", I find 3, count=3, so I will modify 3 minus 1,which are 2 items, so first "2_506" turns into "$count_506", which is "3_506", count increases by 1, so second one turns into "4_506", and now I change item type... This is the logic? – eri0o Jun 04 '15 at 00:35
  • The goal is quite... interesting. Sounds like it will take a few good tries. – WGS Jun 04 '15 at 00:36
  • @Elric Yes, you nailed it. – user25976 Jun 04 '15 at 00:37
  • 2
    Interesting dictionary you have there: it has duplicate keys. How did you manage that? And what is `decaAdd_occurrences`? And is `Deca_dict` supposed to be the same as `deca_dict`? – Paul Cornelius Jun 04 '15 at 00:40
  • Agree with @PaulCornelius. That dictionary will actually contract to a length-4 dictionary, truncating the last item out, when you try to process it. – WGS Jun 04 '15 at 00:41
  • @PaulCornelius Thanks for pointing that out. It was an error when I was trying to simplify my dictionary and data for this post. Changed it. – user25976 Jun 04 '15 at 00:45
  • What values does `decaAdd_occurrences` hold? And when you say the last code things are in the wrong order, which is your output? `update_deca_dbl_dict` is your output? – eri0o Jun 04 '15 at 00:58
  • @Elric Excuse me, decaAdd_occurrences is the Counter_dict as was previously shown. And yes, the output is update_deca_dbl_dict – user25976 Jun 04 '15 at 01:06
  • 1
    Your changes in the variable increment are NOT affecting the value of deca_address_plus.I mean, there is no assign after the while and you just set increment to 1 after each key you search. Sorry, I don't have a computer right now, just thought your code was interesting. – eri0o Jun 04 '15 at 01:13
  • @Elric My logic behind this: if increment = 1 to begin with, the first number += 1. Second number += 2, etc.. If not, all of the updated values will be the same, which is not the idea. – user25976 Jun 04 '15 at 01:15
  • `next_deca_address` is not updated inside your `while iterations > 0`. I don't know getFeatures, but `next_deca_address` should receive a updated value of `increment` after `increment` is incremented. – eri0o Jun 04 '15 at 01:23

3 Answers3

1

Is this approximately what you want? I assume you can deal with the function to change 2_506 into 3_506 etc. Instead of your Counter, I use a set to insure that there are no duplicate values.

In the original post I cut off a line at the bottom, sorry.

values_so_far = set()
d1 = {} # ---your original dictionary with duplicate values---
d2 = {} # d1 with all the duplicates changed
def increment_value(old_value):
    # you know how to write this
    # return the modified string

for k,v in d1.items():
    while v in values_so_far:
        v = increment_value(v)
    d2[k] = v
    values_so_far.add(v)
Paul Cornelius
  • 9,245
  • 1
  • 15
  • 24
  • How will I know how many to increment to without the counter? – user25976 Jun 04 '15 at 01:10
  • You just increment by 1 when you find a duplicate. If that results in another duplicate, you increment by 1 again (that's what the while loop does). Eventually you create a non-duplicate, and you store that and go on. No Counter is ever needed. – Paul Cornelius Jun 04 '15 at 01:56
1

Here's a solution: Essentially, it keeps the first of the duplicate values and increments the prepended number on the rest of the duplicates.

from collections import OrderedDict, defaultdict
orig_d = {'1':'2_506', '2':'2_506', '3':'2_506', '4':'2_600', '5':'2_600'}
orig_d = OrderedDict(sorted(orig_d.items(), key=lambda x: x[0]))

counter = defaultdict(int)
for k, v in orig_d.items():
    counter[v] += 1
    if counter[v] > 1:
        pre, post = v.split('_')
        pre = int(pre) + (counter[v] - 1)
        orig_d[k] = "%s_%s" % (pre, post)

print(orig_d)

Result:

OrderedDict([('1', '2_506'), ('2', '3_506'), ('3', '4_506'), ('4', '2_600'), ('5', '3_600')])
junnytony
  • 3,455
  • 1
  • 22
  • 24
1

I think this does what you want. I modified your input dictionary slightly to better illustrate what happens. A primary difference with what you were doing is that decaAdd_occurrences, which is created from the Counter dictionary, keeps track of not only the counts, but also the value of the current address num prefix. This makes it possible to know what the next num value to use is since both it and the count are updated during the process of modifying Deca_dict.

from collections import Counter

Deca_dict = {
    "1": "2_506",
    "2": "2_506",
    "3": "2_506",
    "4": "2_600",
    "5": "1_650",
    "6": "2_600"
}

decaAdd_occurrences = {k: (int(k.split('_')[0]), v) for k,v in
                                Counter(Deca_dict.values()).items()}

for key, value in Deca_dict.items():
    num, cnt = decaAdd_occurrences[value]
    if cnt > 1:
        route = value.split('_')[1]
        next_num = num + 1
        Deca_dict[key] = '{}_{}'.format(next_num, route)
        decaAdd_occurrences[value] = next_num, cnt-1  # update values

Updated dictionary:

Deca_dict -> {
    "1": "3_506",
    "2": "2_506",
    "3": "4_506",
    "4": "3_600",
    "5": "1_650",
    "6": "2_600"
}
martineau
  • 119,623
  • 25
  • 170
  • 301
  • I have a question regarding the line decaAdd_occurrences[value] = next_num, cnt - 1. If i'm not mistaken, value is of Deca_dict. If originally, value = '2_506', then {'2_506':(2,3)}. Does this line of code change this item to {'2_506':(3,1)}? If yes, why is that? If not, what does it do? – user25976 Jun 05 '15 at 00:32
  • Not exactly. In `decaAdd_occurrences`, the initial value for key `'2_506'` would be `(2, 3)`, but then in the loop, after the first item was updated it would change it to `(3, 2)`. After the second item was updated it would become `(4, 1)`, and then it would stay that way from then on since the `cnt` is not longer `> 1`. This allows one of the duplicates to keep the same value, as you said you wanted. The results show that this is indeed what happened. – martineau Jun 05 '15 at 03:03
  • Is this what the OP actually wanted? If you add one more element to the input Deca_dict - `7:3_600` - your output dictionary will, in fact, contain duplicate values: the original `"3_600"` and one that results from incrementing `"2_600"`. – Paul Cornelius Jun 06 '15 at 05:30
  • @Paul: Good point. It's what the OP said they wanted. It's unclear (to me) from the question if input data like that in the dictionary is a possibility. If it is, then the algorithm proposed by the OP won't solve the problem — all I did was try to implement it efficiently in Python. – martineau Jun 06 '15 at 06:02
  • @martineau Which you did, of course. I couldn't actually fathom what OP was trying to do, but for some reason he appears to dislike duplicates. – Paul Cornelius Jun 06 '15 at 06:22