Before I start: I hope this question is not a duplicate. I've found a couple of similar ones, but none of them seems to describe exactly the same problem. But if it is a duplicate, I will be glad to see a solution (even if it is different from my algorithm.)
I have been trying to answer this question. After a couple of attempts I managed to implement an algorithm that seems to be correct (in C). I've prepared a couple of tests and they all pass.
Now, initially I thought that the task would be easier. Therefore, I would be certain of my solution, and would publish it right after I would see it works. But I'd rather not publish an answer that presents a solution that only seems to be correct. So, I wrote a "proof of correctness", or at least something that looks like that. (I don't remember if I have ever written any proof of correctness for a program, so I'm rather certain its quality can be improved.)
So, I have two questions:
- Is the algorithm that I wrote correct?
- Is the "proof" that I wrote correct?
Also, I'd love to know if you have any tips on how to improve both the algorithm and the "proof" beside correctness, and maybe even the implementation (though I know C, I can always make a mistake). If either the algorithm formulations, the proof, or the C code seems too complicated to read or check, please give me some tips, and I'll try to simplify them.
And please, don't hesitate to point out that I misunderstood the problem completely if that is the case. After all, it is most important to present the right solution for the author of the original question.
I'm going to wait some time for an answer here before I publish an answer to the original question. But eventually, if there won't be any, I think I will publish it anyway.
The problem
To quote the author of the original question:
Suppose I have an array, arr = [2, 3, 5, 9] and k = 2. I am supposed to find subsequences of length k such that no two elements in each subsequence are adjacent. Then find the maximums of those sequences. Finally, find the minimum of the maximums. For example, for arr, the valid subsequences are [2,5], [3,9], [2,9] with maximums 5, 9, and 9 respectively. The expected output would be the minimum of the maximums, which is 5.
My algorithm
I've made two assumptions not stated in the original question:
the elements of the input sequence are unique,
for the input subsequence length and
k
,2 <= k <= (length(S) + 1) / 2
.
They may look a bit arbitrary, but I think that they simplify the problem a bit. When it comes to the uniqueness, I think I could remove this assumption (so that the algorithm will suit for more use cases). But before, I need to know whether the current solution is correct.
Pseudocode, version 1
find_k_length_sequence_maxes_min (S, k)
if k < 2 or length(S) < 2 * k - 1
return NO_SUCH_MINIMUM
sorted = copy(S)
sort_ascending(sorted)
for t from 1 to length(S)
current_length = 0
index = find_index(S, sorted[t])
last_index = index
for u descending from index to 1
if u < last_index - 1 && S[u] <= sorted[t]
current_length += 1
last_index = u
if current_length >= k
return sorted[t]
last_index = index
for u ascending from index to length(S)
if u > last_index + 1 and S[u] <= sorted[t]
current_length += 1
last_index = u
if current_length >= k
return sorted[t]
Pseudocode, version 2
(This is the same algorithm as in version 1, only written using more natural language.)
(1) Let S
be a sequence of integers such that all of its elements are unique.
(2) Let a "non-contiguous subsequence of S
" mean such a subsequence of S
that any two elements of it are non-adjacent in S
.
(3) Let k
be an integer such that 2 <= k <= (length(S) + 1) / 2
.
(4) Find the minimum of maximums of all the non-contiguous subsequences of S
of length k
.
(4.1) Find the minimal element of S
such that it is the maximum of a non-contiguous subsequence of S
of size k
.
(4.1.1) Let sorted
be a permutation of S
such that its elements are sorted in ascending order.
(4.1.2) For every element e
of sorted
, check whether it is a maximum of a non-contiguous subsequence of S
of length k
. If it is, return it.
(4.1.2.1) Let x
and y
be integers such that 1 <= x <= index(minmax(k))
and index(minmax(k)) <= y <= length(S)
.
(4.1.2.2) Let all(x, y)
be the set of all the non-contiguous subsequences of S
between S[x]
(including) and S[y]
(including) such that e
is the maximum of each of them.
(4.1.2.3) Check whether the length of the longest sequence of all(1, index(e))
is greater than or equal to k
. If it is, return e
.
(4.1.2.4) Check whether the sum of the lengths of the longest subsequence of all(1, index(e))
and the length of the longest subsequence of all(index(e), length(S))
is greater than or equal to k
. If it is, return e
.
Proof of correctness
(1) Glossary:
by "observation" I mean a statement not derived from any observation or conclusion, not demanding a proof,
by "conclusion" I mean a statement derived from at least one observation or conclusion, not demanding a proof,
by "theorem" I mean a statement not derived from any observation or conclusion, demanding a proof.
(2) Let S
be a sequence of integers such that all of its elements are unique.
(3) Let a "non-contiguous subsequence of S
" mean such a subsequence of S
that any two elements of it are non-adjacent in S
.
(4) Let k
be an integer such that 2 <= k <= (length(S) + 1) / 2
.
(5) Let minmax(k)
be an element of S
such that it is the minimum of maximums of all the non-contiguous subsequences of S
of length k
.
(6) (Theorem) minmax(k)
is a minimal element of S
such that it is a maximum of a non-contiguous subsequence of S
of length k
.
(7) In other words, there is no element in S
less than minmax(k)
that is a maximum of a non-contiguous subsequence of S
of length k
.
(8) (Proof of (6)) (Observation) Since minmax(k)
is the minimum of maximums of all the non-contiguous sequences of S
of length k
, there is no non-contiguous subsequence of S
of length k
such that its maximum is greater than minmax(k)
.
(9) (Proof of (6)) (Conclusion) If (6), then any element of S
less than minmax(k)
cannot be a maximum of any non-contiguous subsequence of S
of length k
.
(10) (Proof of (6)) QED
(11) Let x
and y
be integers such that 1 <= x <= index(minmax(k))
and index(minmax(k)) <= y <= length(S)
.
(12) Let all(x, y)
be the set of all the non-contiguous subsequences of S
between S[x]
(including) and S[y]
(including) such that minmax(k)
is the maximum of each of them.
(13) (Observation) minmax(k)
is the maximum of the longest sequence of all(1, length(S))
.
(14) This observation may seem too trivial to note. But, apparently it was easier for me to write the algorithm, and prove it, with the longest subsequence in mind, instead of a subsequence of length k
. Therefore I think this observation is worth noting.
(15) (Theorem) One can produce the longest sequence of all(1, index(minmax(k)))
by:
starting from
minmax(k)
,moving to
S[1]
,taking always the next element that is both less than or equal to
minmax(k)
, and non-adjacent to the last taken one.
(16) (Proof of (15)) Let a "possible element" of S
mean an element that is both less than or equal to minmax(k)
, and non-adjacent to the last taken one.
(16a) (Proof of (15)) Let C
be the subsequence produced in (15).
(17) (Proof of (15)) (Observation)
Before the first taken element, there is exactly 0 possible elements,
between any two taken elements (excluding them), there is exactly 0 or 1 possible elements,
after the last taken element, there is exactly 0 or 1 possible elements.
(18) (Proof of (15)) Let D
be a sequence of all(1, index(minmax(k)))
such that length(D) > length(C)
.
(19) (Proof of (15)) At least one of the following conditions is fulfilled:
before the first taken element, there is less than 0 possible elements in
D
,between two taken elements (excluding them) such that there is 1 possible elements between them in
C
, there is 0 possible elements inD
,after the last taken element, there is less than 1 possible element in
D
.
(20) (Proof of (15)) (Observation)
There cannot be less than 0 possible elements before the first taken element,
if there is less than 1 possible elements between two taken elements (excluding them) in
D
, where inC
there is 1, it means that we have taken either an element greater thanminmax(k)
, or an element adjacent to the last taken one, which contradicts (12),if there is less than 1 possible element between the last taken element in
D
, where inC
there is 1, it means that we have taken either an element greater thanminmax(k)
, or an element adjacent to the last taken one, which contradicts (12).
(21) (Proof of (15)) QED
(22) (Observation) (15) applies also to all(index(minmax(k)), length(S))
.
(23) (Observation) length(all(1, length(S))) = length(all(1, index(minmax(k)))) + length(all(index(minmax(k)), length(S)))
.
Implementation
All the tests pass if any of the assert
calls does not abort the program.
#include <limits.h> // For INT_MAX
#include <assert.h> // For assert
#include <string.h> // For memcpy
#include <stdlib.h> // For qsort
int compar (const void * first, const void * second) {
if (* (int *)first < * (int *)second) return -1;
else if (* (int *)first == * (int *)second) return 0;
else return 1;
}
void find_k_size_sequence_maxes_min (int array_length, int array[], int k, int * result_min) {
if (k < 2 || array_length < 2 * k - 1) return;
int sorted[array_length];
memcpy(sorted, array, sizeof (int) * array_length);
qsort(sorted, array_length, sizeof (int), compar);
for (int t = 0; t < array_length; ++t) {
int index = -1;
while (array[++index] != sorted[t]);
int size = 1;
int last_index = index;
for (int u = index; u >= 0; --u) {
if (u < last_index - 1 && array[u] <= sorted[t]) {
++size;
last_index = u;
}
if (size >= k) {
* result_min = sorted[t];
return;
}
}
last_index = index;
for (int u = index; u < array_length; ++u) {
if (u > last_index + 1 && array[u] <= sorted[t]) {
++size;
last_index = u;
}
if (size >= k) {
* result_min = sorted[t];
return;
}
}
}
}
int main (void) {
// Test case 1
int array1[] = { 6, 3, 5, 8, 1, 0, 9, 7, 4, 2, };
int array1_length = (int)((double)sizeof array1 / sizeof (int));
int k = 2;
int min = INT_MAX;
find_k_size_sequence_maxes_min(array1_length, array1, k, & min);
assert(min == 2);
// Test case 2
int array2[] = { 1, 7, 2, 3, 9, 11, 8, 14, };
int array2_length = (int)((double)sizeof array2 / sizeof (int));
k = 2;
min = INT_MAX;
find_k_size_sequence_maxes_min(array2_length, array2, k, & min);
assert(min == 2);
// Test case 3
k = 3;
min = INT_MAX;
find_k_size_sequence_maxes_min(array2_length, array2, k, & min);
assert(min == 8);
// Test case 4
k = 4;
min = INT_MAX;
find_k_size_sequence_maxes_min(array2_length, array2, k, & min);
assert(min == 9);
// Test case 5
int array3[] = { 3, 5, 4, 0, 8, 2, };
int array3_length = (int)((double)sizeof array3 / sizeof (int));
k = 3;
min = INT_MAX;
find_k_size_sequence_maxes_min(array3_length, array3, k, & min);
assert(min == 3);
// Test case 6
int array4[] = { 18, 21, 20, 6 };
int array4_length = (int)((double)sizeof array4 / sizeof (int));
k = 2;
min = INT_MAX;
find_k_size_sequence_maxes_min(array4_length, array4, k, & min);
assert(min == 18);
// Test case 7
int array5_length = 1000000;
int array5[array5_length];
for (int m = array5_length - 1; m >= 0; --m) array5[m] = m;
k = 100;
min = INT_MAX;
find_k_size_sequence_maxes_min(array5_length, array5, k, & min);
assert(min == 198);
}
Edit: Thanks to @user3386109, the number of iterations on sorted
may be reduced in some cases. There need to be at least k - 1
elements less than sorted[t]
to form a subarray of size k
or greater together with sorted[t]
. Therefore, in the for
loop, it should be int t = k - 1
instead of int t = 0
.
Edit: Now that it passed a week, I published my solution as an answer in the original question: Minimum of maximums for k-size nonconsecutive subsequence of array If you will happen to have any further tips on how to improve it, you can share them either here, or in the original question (as comments to my answer).