0

I have a rather large List of Sets of Integers (in Java). I want to remove all sets in the list that are subsets of any other set in the list. If two sets are the same, only one should remain in the final List.

The obvious thing to do is to iterate through my list and check each element against all other elements calling some sort of subset-checking method. But that is very inefficient. Is there something better that I can do?

I'm currently using Hashsets and an ArrayList, but I can easily change that if it's relevant.

The ArrayList can be any type of Collection so I'm thinking that I can probably do something with Sets to avoid equivalent Sets of Integers at least.

Zyzzyphus
  • 5
  • 3
  • Take a look at lambda expressions (a new concept from Java 8). As you said, you don't need to iterate – Davide Jul 03 '18 at 00:13
  • 3
    @Davide I think OP is concerned with runtime performance. – shmosel Jul 03 '18 at 00:15
  • 1
    One possible optimization is to sort the sets by size and avoid checking if a larger set is a subset of smaller one. – shmosel Jul 03 '18 at 00:16
  • If the sets are `HashSet`s, then the determination if one set is a subset of the other becomes O(s), instead of something worse, where "s" is the size of the smaller set. – rgettman Jul 03 '18 at 00:18
  • If we convert the Set into sorted int[], do you think it will be cheaper to check subset using two Set or two sorted int[] (using something like merge). You pay the price of conversion N times where as you do the subset check N^2 times. You will also exclude Object Integer handling cost. – gagan singh Jul 03 '18 at 01:27
  • https://stackoverflow.com/questions/14511655/arraylist-contains-another-arraylist – SedJ601 Jul 03 '18 at 03:05
  • @Zyzzyphus Can you tell how many elements might be in the `Set` ? – prasad_ Jul 03 '18 at 06:37

2 Answers2

0

This is one way of doing it.

public void method() {
        List<Set<Integer>> list = new ArrayList<>();
        Set<Integer> a = new HashSet<>();
        a.add(1);
        a.add(2);
        a.add(3);
        a.add(4);
        a.add(5);
        Set<Integer> b = new HashSet<>();
        b.add(1);
        b.add(2);
        b.add(3);
        list.add(a);
        list.add(b);
        System.out.println("Original :" + list);
        class SizeComarator implements Comparator<Set<?>> {
            @Override
            public int compare(Set<?> o1, Set<?> o2) {
                return Integer.valueOf(o1.size()).compareTo(o2.size());
            }
        }
        Collections.sort(list, new SizeComarator());
        System.out.println("Sorted :" + list);
        List<Set<Integer>> result = new ArrayList<>();
        for(int i=0; i<list.size(); i++) {
            Set<Integer> prev = list.get(i); 
            boolean flag = false;
                for(int j=i+1; j<list.size(); j++) {
                    if(list.get(j).containsAll(prev))
                        flag = true;
                }
            if(!flag)
                result.add(prev);
        }
        System.out.println("Reduced :" + result);
    }
raviiii1
  • 936
  • 8
  • 24
0

This is something I tried based on the requirement:

  • if two sets are same, keep only one set
  • remove sub sets of any set
  • also, there is a flexibility how the input data is

The input data format is modified to use List<int[]> instead of List<Set<Integer>>. The reasons are that the integer sets are not changed during the process and an int[] may perform better and I found it simple to use. I created a wrapper around the input integer sets data as IntegerSet and worked with it.

Here is the code:

import java.util.*;
import java.util.stream.*;

public class IntegerSetProcess {

    public static void main(String[] args) {

        // Input data

        List<IntegerSet> inputList =
              Arrays.asList(new IntegerSet(new int [] {11}), 
                              new IntegerSet(new int [] {12, 555}), 
                              new IntegerSet(new int [] {2, 333, 555, 9, 144, 89}), 
                              new IntegerSet(new int [] {12}),
                              new IntegerSet(new int [] {12, 3, 555, 90, 42, 789, 15000}), 
                              new IntegerSet(new int [] {2, 555, 9, 89, 333, 144}), 
                              new IntegerSet(new int [] {555, 12}), 
                              new IntegerSet(new int [] {222, 12, 41320, 0, 769942}),
                              new IntegerSet(new int [] {910, 77}));

        // Distinct IntegerSets 
        List<IntegerSet> distinctList =
               inputList.stream()
                         .distinct()
                         .sorted()
                         .collect(Collectors.toList());

        // Filter subsets to get result
        List<IntegerSet> resultList = doSubsetFiltering(distinctList);

        // Result data in original form (optional)
        resultList.stream()
                  .forEach(e -> System.out.println(Arrays.toString(e.getOriginal())));
    }

    /*
     * Takes the input List<IntegerSet> and removes all the IntegerSets with
     * elements as subset in any other IntegerSet.
     */
    private static List<IntegerSet> doSubsetFiltering(List<IntegerSet> listIs) {
        List<IntegerSet> removedIs = new ArrayList<>();
        OUTER_LOOP: // size-1, the last element is not iterated
        for (int i = 0; i < listIs.size()-1; i++) { 
            IntegerSet thisIs = listIs.get(i);
            INNER_LOOP: // i+1, the checking starts from the next IntegerSet
            for (int j = i+1; j < listIs.size(); j++) { 
                IntegerSet nextIs = listIs.get(j);
                if (isSubset(thisIs.getData(), nextIs.getData())) {
                    // To remove thisIs set as it is a subset of isNext
                    removedIs.add(thisIs); 
                    break INNER_LOOP;
                 }
             } // inner-for-loop
        } // outer for-loop
        listIs.removeAll(removedIs);
        return listIs;
    }

    // Returns true if the input array thisIs has all its elements in nextIs.
    public static boolean isSubset(int[] thisIs, int[] nextIs) {
        for(int i : thisIs) { 
            if (Arrays.binarySearch(nextIs, i) < 0) {
                return false;
            }
        }
        return true;
    }
}

import java.util.*;
import java.util.stream.*;

public class IntegerSet implements Comparable<IntegerSet> {

    private int[] data;
    private int[] original;

    public IntegerSet(int[] intput) {
        original = IntStream.of(intput).toArray();
        data = intput;
        Arrays.sort(data);
    }

    public int[] getData() {
        return data;
    }

    public int[] getOriginal() {
        return original;
    }

    @Override
    public String toString() {
        return Arrays.toString(data);
    }

    @Override
    public boolean equals(Object obj) {
        IntegerSet is = (IntegerSet) obj;
        if (Arrays.equals(data, is.getData())) {    
            return true;
        }
        return false;
    }

    @Override
    public int hashCode() {
        return data.length;
    }

    @Override
    public int compareTo(IntegerSet is) {
        return Integer.valueOf(data.length).compareTo(is.getData().length);
    }
}
prasad_
  • 12,755
  • 2
  • 24
  • 36