5

Let we have int[] A = new int[1000] and int[] subA = new int [300] such that subA \in A (subA is a subset of A). How to find an array A \ subA in a fastest way in Java? Both given arrays A and subA are sorted.

EDIT: sorry, forgot to mention that arrays contain different elements, simply they contain indeces of another structures like matrix rows.

I'm thinking of this solution:

// supp is short for supplement
int[] supp = new int[A.length - subA.length];
int j = A[0], c = 0;
for (int i = 0; i < subA.lengh; i++) {
    // elegantly can be: while (j < subA[i]) supp[c++] = j++;
    while (j < subA[i]) {
        supp[c] = j;
        c++; j++;
    }
    j = subA[i] + 1;
}

Currently testing this approach. I will be back when the answer is ready.

Sophie Sperner
  • 4,428
  • 8
  • 35
  • 55

4 Answers4

4

Try something like this:

// A index
int ai = 0;
// subA index
int sai = 0;
// result array
int[] result = new int[A.length - subA.length];
// index in result array
int resi = 0;

while ai < A.length && sai < subA.length;
    // same elements - ignore
    if (A[ai] == subA[sai]) {
        ai++;
        sai++;
    // found an element in A that does not exist in subA
    } else {
        // Store element
        result[resi] = A[ai];
        resi++;
        ai++;
    }
}

// Store elements that are left in A
for (;ai < A.length; ai++, resi++) {
    result[resi] = A[ai];
}
Ivan Mushketyk
  • 8,107
  • 7
  • 50
  • 67
  • 1
    A "while"-loop is not "in a fastest way"! Use Arrays.binarySearch() to find the first possible match - this works since the arrays arrays sorted. – tigger Nov 02 '12 at 11:09
  • @tigger errr what/why? binarySearch() for what? – Shark Nov 02 '12 at 11:11
  • 1
    @tigger My method will solve the task in O(n) steps, meaning that there is a need for only one path through array A. It seems to be rather fast. Also I do not perform search in the array, so I barely need this functionality. – Ivan Mushketyk Nov 02 '12 at 11:16
  • This code has the right idea, but will blow up if any of the arrays contains duplicate values. – Jochen Nov 02 '12 at 11:39
  • @Jochen: pretty sure it'll just skip them since they're sorted. then again i might be wrong... – Shark Nov 02 '12 at 11:46
  • if A contains a multiple times and subA contains a one time, then a should not be in result. But in your algorithm, it is. – DaveFar Nov 02 '12 at 16:43
  • @DaveBall In hte case you describe it will not work, I just hope that elements and A and in subA are unique, as they should be in set. If not, than it is not hard to change the algorithm so that it could handle this case. – Ivan Mushketyk Nov 03 '12 at 14:22
1

If you say elements are sorted and all different, then you only need to find the index of first element of subA in A, and then just use System.arrayCopy() to copy data in most efficient way:

    int index = Arrays.binarySearch(A, subA[0]);

    int[] diff = new int[A.length - subA.length];

    System.arraycopy(A, 0, diff, 0, index);
    System.arraycopy(A, index+subA.length, diff, index, A.length-index-subA.length);

PS. I didn't check all the index placement and calculations, but you get the idea.

Denis Tulskiy
  • 19,012
  • 6
  • 50
  • 68
  • cool idea to make the algorithm a constant factor faster if subA is much saller than A. But ultimately, your algorithm is O(n log n), whereas the straight forward algorithm is O(n). And using a view on A (see my answer), you get an algorithm of O(1) if you know that all elements in A and subA are unique and subA is a subArray of A. – DaveFar Nov 02 '12 at 16:49
  • It will work only if subA is "subarray" (like substring) of A, like if A = [1, 2, 3, 4, 5, 6] and subA = [3, 4, 5]. Then it will work. But what if A = [1, 2, 3, 4, 5, 6] and subA = [2, 6]? subA will still be a subset of A. – Ivan Mushketyk Nov 02 '12 at 17:12
  • @DaveBall Why this algorithm is O(n log n)? It is O(n), because it requires to do binary search once and copy two parts of an array. In the worst case scenario copying parts of A will take O(n) and binary search will take O(log n). And O(n) + O(log n) = O(n). – Ivan Mushketyk Nov 02 '12 at 17:16
  • Oh, IC. You are not binary searching for each subarray element, but use the precondition that the subarray needs to be composed of elements consecutive in the original array. But the OP specified "subA is a subset of A". – DaveFar Nov 02 '12 at 17:45
  • @IvanMushketyk: yeah, didn't think about that case. still, this could be a fallback if subA is a contiguous subset. arrayCopy is pretty much a memmove and is usually faster than manual copying. but that's probably overoptimizing. – Denis Tulskiy Nov 02 '12 at 18:03
0

Since you said both arrays are sorted, this sounds like a "i want you to traverse both arrays and cut out the parts from A between members of subA" kinda homework to me.

so let's try and draft it out

  • array A is sorted with 1000 members
  • subA is sorted with 300 members
  • arrayA has all subA's elements

meaning that we can do something like...

public ArrayList findDifferences(int[] arrayA, int[] subA)
{
    ArrayList retVal = new ArrayList();
    for(int i = 0; i < arrayA.size; i++)
    {  
        if(arrayA[i] < subA[index]
            retVal.add(arrayA[i]);
        else if(arrayA[i] == subA[index])
            index++;
    }
    return retVal;
}

I was gonna say that somehow you can calculate ranges to copy but i guess it ended up like this.

then there's this as well

 List a = new List();
 a.addAll(arrayA);
 List b = new List();
 b.addAll(subA);
 a.removeAll(b);
 return a;
Shark
  • 6,513
  • 3
  • 28
  • 50
  • 1
    What if all elements that A share with subA are located at the beginning of the array A? Without checking if index variable does not exceeds length of subA you will get OutOfBoundsException. – Ivan Mushketyk Nov 02 '12 at 11:21
  • @IvanMushketyk when i saw your answer I knew mine was obsolete. You just don't argue with russians on sets and graphs :) It's good, posted first, and she should roll with that. Mine just looks more java-ish :D – Shark Nov 02 '12 at 11:24
  • If arrayA contains an element multiple times, your code is erroneous. – DaveFar Nov 02 '12 at 16:38
  • @Shark Frankly speaking I am not Russian. I am Ukrainian :) – Ivan Mushketyk Nov 02 '12 at 17:08
  • @IvanMushketyk close enough :) i'm serbian. – Shark Nov 03 '12 at 14:58
0

The fastest and most efficient way is to make A \ SubA a view on A, i.e. not holding own references to the elements, but being backed by A and SubA. This is similar to difference from Guava Sets.

Of course changes to A and SubA after creating that view must be taken into account, which can be an advantage or disadvantage, depending on your situation.

Exemplary implementation for arbitrary Lists (i.e. in your case, use new ImmutableSubarrayList<E>(Arrays.asList(A),Arrays.asList(SubA)):

import java.util.AbstractSequentialList;
import java.util.List;
import java.util.ListIterator;
import java.util.NoSuchElementException;


public class ImmutableSubarrayList<E extends Comparable<E>> extends AbstractSequentialList<E>{

    final List<E> a, subA;
    final int size;

    public ImmutableSubarrayList(List<E> aParam, List<E> subAParam){
        super();
        a = aParam;
        subA = subAParam;
        assert a.containsAll(subA) : "second list may only contain elements from first list";

        // Iterate over a, because a.size()-subA.size() may not be correct if a contains equal elements. 
        int sizeTemp = 0;
        for (E element : a){    
            if (!subA.contains(element)){
                sizeTemp++;
            }
        }
        size = sizeTemp;
    }

    public int size() {
        return size;
    }

    public ListIterator<E> listIterator(final int firstIndex) {
        //create a ListIterator that parallely 
        // iterates over a and subA, only returning the elements in a that are not in subA
        assert (firstIndex >=0 && firstIndex <= ImmutableSubarrayList.this.size()) : "parameter was "
                           +firstIndex+" but should be betwen 0 and "+ImmutableSubarrayList.this.size();
        return new ListIterator<E>() {

            private final ListIterator<E> aIter = a.listIterator();
            private final ListIterator<E> subAIter = subA.listIterator();
            private int nextIndex = 0;

            {
                for (int lv = 0; lv < firstIndex; lv++ ){
                    next();
                }
            }

            @Override
            public boolean hasNext() {
                return nextIndex < size;
            }

            @Override
            public void add(E arg0) {
                throw new UnsupportedOperationException("The list being iteratred over is immutable");
            }

            @Override
            public boolean hasPrevious() {
                return nextIndex > 0;
            }

            @Override
            public int nextIndex() {
                return nextIndex;
            }

            @Override
            public E next() {
                if (!hasNext()){
                    throw new NoSuchElementException();
                }
                nextIndex++;
                return findNextElement();
            }

            @Override
            public E previous() {
                if (!hasPrevious()){
                    throw new NoSuchElementException();
                }
                nextIndex--;
                return findPreviousElement();
            }

            @Override
            public int previousIndex() {
                return nextIndex-1;
            }

            @Override
            public void set(E arg0) {
                throw new UnsupportedOperationException("The list being iteratred over is immutable");
            }

            @Override
            public void remove() {
                throw new UnsupportedOperationException("The list being iteratred over is immutable");          
            }

            private E findNextElement() {
                E potentialNextElement = aIter.next();
                while (subAIter.hasNext()){
                    E nextElementToBeAvoided = subAIter.next();
                    subAIter.previous();
                    assert (potentialNextElement.compareTo(nextElementToBeAvoided) > 0) : 
                        "nextElementToBeAvoided should not be smaller than potentialNextElement";
                    while (potentialNextElement.compareTo(nextElementToBeAvoided) == 0){
                        potentialNextElement = aIter.next();
                    }
                    subAIter.next();
                }
                return potentialNextElement;
            }

            //in lack of lambdas: clone of findNextElement()
            private E findPreviousElement() {
                E potentialPreviousElement = aIter.previous();
                while (subAIter.hasPrevious()){
                    E previousElementToBeAvoided = subAIter.previous();
                    subAIter.previous();
                    assert (potentialPreviousElement.compareTo(previousElementToBeAvoided) < 0) : 
                        "previousElementToBeAvoided should not be greater than potentialPreviousElement";
                    while (potentialPreviousElement.compareTo(previousElementToBeAvoided) == 0){
                        potentialPreviousElement = aIter.previous();
                    }
                    subAIter.previous();
                }
                return potentialPreviousElement;
            }
        };
    }
}
DaveFar
  • 7,078
  • 4
  • 50
  • 90
  • Why not use [`difference` from Guava Sets](http://guava-libraries.googlecode.com/svn/tags/release04/javadoc/com/google/common/collect/Sets.html)? – halex Nov 02 '12 at 11:19
  • @halex Using a third-party library to diff two sets? Really? why? Also dave, without the iterator() method this is no answer... – Shark Nov 02 '12 at 11:27
  • I do not use Guava, pure Java. – Sophie Sperner Nov 02 '12 at 11:47
  • @halex and Sophie: ok, I've implemented this special case with subA being a sublist of A (for general lists, not just arrays). – DaveFar Nov 02 '12 at 16:34