0

I'm trying to select some elements in a python list. The list represents a distribution of the sizes of some other elements, so it contains multiple repeated values.

After I find the average value on this list, I want to pick those elements which value lies between an upper bound and a lower bound around that average value. I can do that easily, but it selects too many elements (mainly because the distribution I have to work with is pretty much homogeneous). So I would like to be able to select the bounds where to chose the values, but also limit the spread of the search to like 5 elements below the average and 5 elements above.

I'll add my code (it is super simple).

    avg_lists = sum_lists/len(lists)

    num_list = len(list)
    if (int(num_comm/10)%2 == 0):
        window_size = int(num_list/10)
    else:
        window_size = int(num_list/10)-1

    out_file = open('chosenLists', 'w+')
    chosen_lists = []
    for list in lists:
         if ((len(list) >= (avg_lists-window_size)) & (len(list)<=(avg_lists+window_size))):
         chosen_lists.append(list)
    out_file.write("%s\n" % list)
smci
  • 32,567
  • 20
  • 113
  • 146
  • 2
    Why you don't use [`set`](https://docs.python.org/2/library/sets.html)? – Mazdak Jun 22 '15 at 20:22
  • you missed the 'h' in 'homogeneous' – Jacob Zimmerman Jun 22 '15 at 20:26
  • Which at most 5 elements above and below the average respectively do you want? Any 5 above and 5 below within the window or the 5 above closest to the average and the 5 below closest to the average? – das-g Jun 22 '15 at 20:34
  • Is it intended that you unconditionally write (only) the last list in `lists` to file `chosenLists`? Also, is the writing to a file in any way relevant to your question? If not, remove it from the example. (Instead, you might want to tell us where `num_comm` comes from. See [How to create a Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve).) – das-g Jun 22 '15 at 20:41
  • [collections.Counter](https://docs.python.org/2/library/collections.html#collections.Counter) is another good way to go for a sorted bag (set allowing multiplicity). Works well with median (like dlask does) or mean. – smci Jun 22 '15 at 23:04

1 Answers1

0

If you are allowed to use median instead of average then you can use this simple solution:

def select(l, n):
    assert n <= len(l)
    s = sorted(l)           # sort the list
    i = (len(s) - n) // 2
    return s[i:i+n]         # return sublist of n elements from the middle

print select([1,2,3,4,5,1,2,3,4,5], 5)   # shows [2, 2, 3, 3, 4]

The function select returns n elements closest to the median.

dlask
  • 8,776
  • 1
  • 26
  • 30