1

I have an array of ints which describes the absolute occurrence of unique items in a data set. eg. a = [5, 3, 1] means there are three unique items with a data set length of 9, perhaps x y and z, and they occur

x -> 5 times
y -> 3 times
z -> once

How can I "stretch" array a to a smaller or larger sized int array by maintaining proportions between the ints? Since exact proportions can't be maintained, I think about rounding things up, eg an array of 3 items shrunk from a would look like:

x -> 2 times
y -> once
z -> none (because it's the least probable to occur in the original array)
Alex
  • 449
  • 1
  • 4
  • 16

1 Answers1

0

You could use list multiplying. Let me know if this example is enough for you to continue with your work.

from collections import Counter
from math import ceil

init_list = [3, 4, 5, 5, 5, 4, 4, 4]

occur_dict = Counter(init_list)
new_length = 20

old_length = len(init_list)
new_occur_dict = {num: ceil(occur / old_length * new_length)
                  for (num, occur) in occur_dict.items()}
# new occurrences dict, rounded up so sum should be bigger than new _length

sorted_nums = [num for (num, occur) in sorted(occur_dict.items(),
                                              key=lambda x: x[1])]
# sorting keys by occurrences, so lowest number will be first
while sum(new_occur_dict.values()) > new_length:
    for number in sorted_nums:
        new_occur_dict[number] -= 1 #removing extra occurrences to match new_length
        if sum(new_occur_dict.values()) == new_length:
            break

new_list = []
for item in occur_dict:
    new_list += [item] * new_occur_dict[item]
Karol Adamiak
  • 141
  • 1
  • 9
  • Your counting loop has a quadratic complexity. If you used `collections.Counter` instead of a repeated `list.count`, you'd have a linear complexity. – Stef Dec 08 '21 at 11:42
  • Also, with this code, there is no guarantee that `new_list` will have length `new_list_length`. In fact, with this example, `new_list = [3,3,3, 5,5,5,5,5,5,5,5, 4,4,4,4,4,4,4,4,4,4]`, so `len(new_list) = 21`, not 20. – Stef Dec 08 '21 at 11:48
  • The discrepancy in `new_list`'s length happens if too many of the values are rounded up, or too many of the values are rounded down. One way to fix this would be to use a custom rounding function instead of `round`, inspired by Bresenham's algorithm: every time you've rounded a number, keep track of the accumulated error, and try to balance the rounding up and the rounding down so that you end up with exactly the right number of elements. For example, with `init_list = [3, 4, 5]` and `new_list_length=100`, you want to end up with counts (33, 33, 34), not (33, 33, 33). – Stef Dec 08 '21 at 11:57
  • Thanks @Stef for pointing out the mistakes. I checked this particular list with new_length in range(1,10000) and now it works fine, actual length and desired length matches. – Karol Adamiak Dec 08 '21 at 12:12
  • I think you shouldn't sort the keys by occurrences, but rather by fractional part before rounding up: if a count has a low fractional part, you can happily round it down; if a count has high fractional part, then you'd prefer to round it up. So for instance, if old_lst = [3,3,5], and new_length = 100, first you round everything up, gettingn {3: 67, 5: 34}; then you have an excess of 1, but you'd rather remove that excess from the 34 (because it comes from 33.33333, which has a low fractional part) than from the 67 (because it comes from 66.66667, which has a high fractional part) – Stef Dec 08 '21 at 14:08
  • Also note you don't need to build the new list explicitly with this loop of "list multiplication": Counter already handles that with its .elements() method: https://docs.python.org/3/library/collections.html#collections.Counter.elements – Stef Dec 08 '21 at 14:34
  • [Try it online!](https://tio.run/##dVFBTsMwELznFStxqI3S0MIFVeKEEOLCBW4IRa6zKVadONgOFBBvD@skpSZAFNmyPbMzu9O8@SdTn503tutKayqohH8CVTXGepCodAqVKcokKbAEJ4XG3JvcC7tBz4wucmna2rsUhqvctRVfJUBfeKQTXACtETR7EbpFxzjvYX1NAh34cLLn9oBSG@FHLuE@YLsCeTzQSmNhm0pQNUQCymNF9eGz59f4OmVTW0zymB2LTPiFKsuxiUOpQxMwj6xPBLO2KYRHNqjOl71injIS5UH1XTXMinqDLIjwFBxNHYtYZ/SSwhbfLrSo1oWA3arPhO0elo/8YfFIJj6HYVr0ra0jB0nShyqN1ii9MrXbZ3sZ3tEOwWrlyP8kXYfPf8Q6SgQGG2uwKfP7Hp/5jxI8Q40VkrEQf3IE91d39ze312STJuPCTGZCrCH8Yh2WcaVNzjLXaOXZaKSxqiaTPD7854NcnC54DP27418wnnTdFw "Python 3.8 (pre-release) – Try It Online") – Stef Dec 08 '21 at 14:42
  • (I'm not going to post an answer - feel free to use the code I poster at the "Try it online!" link.) – Stef Dec 08 '21 at 14:43