I have tried to summarize the problem statement something like this::
Given n
, k
and an array(a list) arr
where n = len(arr)
and k
is an integer
in set (1, n) inclusive
.
For an array (or list) myList
, The Unfairness Sum is defined as the sum
of the absolute differences between all possible pairs (combinations with 2 elements each) in myList
.
To explain: if mylist = [1, 2, 5, 5, 6]
then Minimum unfairness sum or MUS. Please note that elements are considered unique by their index
in list not their values
MUS = |1-2| + |1-5| + |1-5| + |1-6| + |2-5| + |2-5| + |2-6| + |5-5| + |5-6| + |5-6|
If you actually need to look at the problem statement, It's HERE
My Objective
given n, k, arr
(as described above), find the Minimum Unfairness Sum
out of all of the unfairness sums of sub arrays possible with a constraint that each len(sub array) = k
[which is a good thing to make our lives easy, I believe :) ]
what I have tried
well, there is a lot to be added in here, so I'll try to be as short as I can.
My First approach was this where i used itertools.combinations
to get all the possible combinations and statistics.variance
to check its spread of data
(yeah, I know I'm a mess).
Before you see the code below, Do you think these variance and unfairness sum are perfectly related (i know they are strongly related) i.e. the sub array with minimum variance
has to be the sub array with MUS
??
You only have to check the LetMeDoIt(n, k, arr)
function. If you need MCVE, check the second code snippet below.
from itertools import combinations as cmb
from statistics import variance as varn
def LetMeDoIt(n, k, arr):
v = []
s = []
subs = [list(x) for x in list(cmb(arr, k))] # getting all sub arrays from arr in a list
i = 0
for sub in subs:
if i != 0:
var = varn(sub) # the variance thingy
if float(var) < float(min(v)):
v.remove(v[0])
v.append(var)
s.remove(s[0])
s.append(sub)
else:
pass
elif i == 0:
var = varn(sub)
v.append(var)
s.append(sub)
i = 1
final = []
f = list(cmb(s[0], 2)) # getting list of all pairs (after determining sub array with least MUS)
for r in f:
final.append(abs(r[0]-r[1])) # calculating the MUS in my messy way
return sum(final)
The above code works fine for n<30
but raised a MemoryError
beyond that.
In Python chat, Kevin suggested me to try generator
which is memory efficient
(it really is), but as generator also generates those combination on the fly as we iterate
over them, it was supposed to take over 140 hours (:/) for n=50, k=8 as estimated.
I posted the same as a question on SO HERE (you might wanna have a look to understand me properly - it has discussions and an answer by fusion which takes me to my second approach - a better one(i should say fusion's approach xD)).
Second Approach
from itertools import combinations as cmb
def myvar(arr): # a function to calculate variance
l = len(arr)
m = sum(arr)/l
return sum((i-m)**2 for i in arr)/l
def LetMeDoIt(n, k, arr):
sorted_list = sorted(arr) # i think sorting the array makes it easy to get the sub array with MUS quickly
variance = None
min_variance_sub = None
for i in range(n - k + 1):
sub = sorted_list[i:i+k]
var = myvar(sub)
if variance is None or var<variance:
variance = var
min_variance_sub=sub
final = []
f = list(cmb(min_variance_sub, 2)) # again getting all possible pairs in my messy way
for r in f:
final.append(abs(r[0] - r[1]))
return sum(final)
def MainApp():
n = int(input())
k = int(input())
arr = list(int(input()) for _ in range(n))
result = LetMeDoIt(n, k, arr)
print(result)
if __name__ == '__main__':
MainApp()
This code works perfect for n up to 1000
(maybe more), but terminates due to time out
(5 seconds is the limit on online judge :/ ) for n beyond 10000
(the biggest test case has n=100000
).
=====
How would you approach this problem to take care of all the test cases in given time limits (5 sec) ? (problem was listed under algorithm
& dynamic programming
)
(for your references you can have a look on
- successful submissions(py3, py2, C++, java) on this problem by other candidates - so that you can explain that approach for me and future visitors)
- an editorial by the problem setter explaining how to approach the question
- a solution code by problem setter himself (py2, C++).
- Input data (test cases) and expected output
Edit1 ::
For future visitors of this question, the conclusions I have till now are,
that variance
and unfairness sum
are not perfectly
related (they are strongly
related) which implies that among a lots of lists of integers, a list with minimum variance
doesn't always have to be the list with minimum unfairness sum
. If you want to know why, I actually asked that as a separate question on math stack exchange HERE where one of the mathematicians proved it for me xD (and it's worth taking a look, 'cause it was unexpected)
As far as the question is concerned overall, you can read answers by archer & Attersson below (still trying to figure out a naive approach to carry this out - it shouldn't be far by now though)
Thank you for any help or suggestions :)