How can I divide an array into K sub-arrays such that the sum of the number of duplicate elements in all the sub-array is minimum?

Question

As an example, let the array be A={1,1,2,1,2} and K=3. A can be divided into {1}, {1,2} and {1,2}. So, number of duplicate elements in those sub-arrays is 0, 0 and 0. So, the sum of them is 0.

This is not a homework problem. The actual problem is different. I have thought of solving that if I know this as a sub-problem. So, the asked question is the intermediate thought in solving the actual problem. — Atul Kumar Ashish, Aug 09 '20 at 18:03

F. Müller · Answer 1 · 2020-08-10T14:13:28.900

This is quite an interesting challenge. I have used java to illustrate my approach.

Divide the problem into bits
I have split the whole problem into smaller bits:

We need to setup a storage for the subarrays based on the split size
The subarrays should contain the same number of elements, unless there is a remainder (e.g. 10 elements split in k = 3 subarrays resulting in arrays with length: 3, 3, 4)
Split the elements onto the subarrays in such a way, that there is a minimum of duplicates

1 + 2 - Split the array into equal parts
I already made the example with an array of length 10 and k = 3. The subarrays will be of length 3, 3 and 4 due to the remainder given by the division.

In the snippet I make sure to fill the arrays with 0 and there will be 0 to 1 extra element per subarray. If there is a remainder, the extra elements will be split on all the subarrays.

In my example I have used an array with length of 13 and k = 3 so it will look like this:

[[0, 0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]

3 - Strategy to reduce duplicates
I was thinking that we could start by analyzing the given array. We can find out how many times each individual number exists by counting. Once we know how many times these numbers exists, we can sort the map by value and end up with a map that contains the occurrences for each number and starts with the highest occurrences.

In my example:

1=4 // contains 1 exactly 4 times
2=4 // contains 2 exactly 4 times
3=3 // ...
4=1
5=1

What do we get from this? Well, we know for sure, that we don't want all of these 1s in the same subarray, therefore the idea is to split all the occurrences on all the subarrays equally. If we end up with like 4x 1 and 4x 2 and k = 3 (as in the example above) then we can put one 1 and 2 into every subarray. This leaves us with 1 duplicate each (one additional 1 and one additional 2).

In my example this would look like:

[[1, 2, 3, 4, 5], [1, 2, 3, 0], [1, 2, 3, 0]]
// 1 used 3 times => 1 duplicate
// 2 used 3 times => 1 duplicate
// 3 used 3 times => ok
// 4 used 1 time  => ok
// 5 used 1 time  => ok

To do this, we loop through the occurrence map, add the keys and keep track of the remaining numbers that we can use (in the snippet this is the usage-map).

We can do this with every key until we only have duplicates left. At this point the subarrays contain only unique numbers. Now for the duplicates we can repeat the whole process again and split them equally on the subarrays that are not yet filled completly.

In the end it looks like this:

// the duplicate 1 got placed in the second subarray
// the duplicate 2 got placed in the third subarray
[[1, 2, 3, 4, 5], [1, 2, 3, 1], [1, 2, 3, 2]]

Java Code
I am not sure how far you can take this and how well it will perform. At least with the few tests that I did it seems to be working just fine. You may find a more performant solution, but I could imagine, that this is a way to solve this problem.

Anyway, here is my attempt:

public static void main(String args[]) {
    final List<Integer> list = Arrays.asList(1, 2, 3, 1, 3, 4, 3, 5, 1, 2, 1, 2, 2);
    final Map<Integer, Integer> occurrenceMap = findOccurrences(list);
    final Map<Integer, Integer> occurrenceMapSorted = occurrenceMap;
    occurrenceMapSorted.entrySet().stream()
        .sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
        .forEach(System.out::println);
    
    final List<List<Integer>> sublists = setupSublists(list.size(), 3);
    System.out.println(sublists);
    
    final Map<Integer, Integer> usageMap = new HashMap<>(occurrenceMapSorted.size());
    
    for (int i = 0; i < sublists.size(); i++) {
        final List<Integer> sublist = sublists.get(i);
        populateSublist(occurrenceMapSorted, usageMap, sublist);
    }
    
    System.out.println(sublists);
}

public static void populateSublist(Map<Integer, Integer> occurrenceMapSorted, Map<Integer, Integer> usageMap, List<Integer> sublist) {
    int i = 0;
    int skipp = 0;
    while (i < sublist.size() && sublist.get(i) == 0) {
        for (Map.Entry<Integer, Integer> entry : occurrenceMapSorted.entrySet()) {
            if (skipp > 0) {
                skipp--;
                continue;
            }
            final int entryKey = entry.getKey();
            final Integer usageCount = usageMap.getOrDefault(entryKey, null);
            if (usageCount == null || usageCount < entry.getValue()) {
                if (usageCount == null) {
                    usageMap.put(entryKey, 1);
                } else {
                    usageMap.put(entryKey, usageCount + 1);
                }
                
                sublist.set(i, entryKey);
                System.out.println("i: " + i);
                System.out.println("sublist: " + sublist);
                
                System.out.println("usage: ");
                usageMap.entrySet().stream()
                    .sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
                    .forEach(System.out::println);
                System.out.println();
                
                i++;
                skipp = i;
                break;
            }
        }
    }
}

public static List<List<Integer>> setupSublists(int listLength, int numberOfSublists) {
    if (numberOfSublists <= 1 || numberOfSublists > listLength) {
        throw new IllegalArgumentException("Number of sublists is greater than the number of elements in the list or the sublist count is less or equal to 1.");
    }
    final List<List<Integer>> result = new ArrayList<>(numberOfSublists);
    final int minElementCount = listLength / numberOfSublists;
    int remainder = listLength % numberOfSublists;
    for (int i = 0; i < numberOfSublists; i++) {
        final List<Integer> sublist = new ArrayList();
        boolean addRemainder = true;
        for (int j = 0; j < minElementCount; j++) {
            sublist.add(0);
            if (remainder > 0 && addRemainder) {
                sublist.add(0);
                addRemainder = false;
                remainder--;
            }
        }
        result.add(sublist);
    }
    return result;
}

public static Map<Integer, Integer> findOccurrences(List<Integer> list) {
    final Map<Integer, Integer> result = new HashMap();
    for (int i = 0; i < list.size(); i++) {
        final int listElement = list.get(i);
        final Integer entry = result.getOrDefault(listElement, null);
        if (entry == null) {
            result.put(listElement, 1);
        } else {
            result.put(listElement, entry.intValue() + 1);
        }
    }
    return result;
}

גלעד ברקן · Answer 2 · 2020-08-12T01:02:35.090

Let dp[i][k] represent the best split into k subarrays up to the ith index. If A[i] does not appear in the last subarray we just chose, the optimal solution won't change if we append it. Otherwise, our choice is to start a new subarray, or to shorten the previous subarray chosen until it passes the leftmost occurrence of A[i] that's in it and see if that's better.

If we were to extend it further back; first, we already increased the optimal solution by 1 by adding A[i]; and if we had a previous possibility (up to A[i-1][k]) that was smaller by 1 (thus compensating for the addition), we would have started out from that one.

To calculate the new possibility, where the left border of the current kth subarray is just to the right of the leftmost occurrence of A[i], we can find out in O(log n) the number of distinct values in the proposed kth subarray, and the proposed (k-1)th subarray (a wavelet tree is one option) and subtract those counts from the total number of elements in each.

How can I divide an array into K sub-arrays such that the sum of the number of duplicate elements in all the sub-array is minimum?

2 Answers2