How to find the difference between two arrays in C?

Question

I've been trying to write c programs for finding the union, intersection and difference between two arrays, and while the first two worked out fine, I'm having some trouble finding the difference between two arrays. With difference I mean each element that is in array1, that is not in array2.

I want the third array to contain every element in array1 that is not in array2, and not vica versa. So if array1 is [1, 2, 3], and arr2 is [3, 4, 5], then arr3 is [1, 2]. I am also unsure how to find the difference if the two arrays are of different sizes.

My output is a bunch of zeros and negative numbers:

The difference is: 1

The difference is: 2

The difference is: -14200

The difference is: 0

The difference is: -14340

The difference is: 0

This is the code I've been working with:

#include <stdio.h>

int main()
{
  int arr1[100];
  int arr2[100];
  int size1, size2, i, j, s=0;

  //enter array size
  printf("\nPlease enter array1 size: \n");
  scanf("%d", &size1);
  printf("\nPlease enter array2 size: \n");
  printf("\n--------------------------- \n");
  scanf("%d", &size2);

  //setting up a third array to contain the difference
  int tot_size = size1+size2;
  int arr3[tot_size];


  //enter array elements
  for(i=0;i<size1;++i)
  {
    printf("\nPlease enter array1 element %d:\n", i);
    scanf("%d", &arr1[i]);
  }
  printf("\n--------------------------- \n");
  for(i=0;i<size2;++i)
  {
    printf("\nPlease enter array2 element %d:\n", i);
    scanf("%d", &arr2[i]);
  }

  printf("\n--------------------------- \n");


  //compare the two arrays, if two elements are not equal
  //store them in a third array
  for(i = 0; i < size1; i++)
  {
    for(j = 0; j < size2; j++)
    {
      if(arr1[i] != arr2[j])
      {
        arr3[s] = arr1[i];
        ++i;
        ++j;
        ++s;
      }
    }
  }

  for(i=0;i<s;++i)
    printf("\nThe difference is: %d\n", arr3[i]);

}

Any help would be much appreciated, as I am new to C and still have lots to learn.

Define "difference". What happens if one array is larger than the other? Why do you compare an entire row against one element in your loop? — Nick, Mar 07 '18 at 14:57
Your last for shouldn't go to "tot_size" but to "s". Plus, if I understand correctly, the if condition in the for is suspicious since it will add a new "difference" for each arr1[i] different from arr2[j]. If arr1 size is 10 and arr2 size is 20, you will have 200 possibility ! — Tom's, Mar 07 '18 at 14:57
Hey, thanks for responding. With difference I mean: each element that is in array 1, that is not in array 2. So I want said elements in a third array. — Doe J, Mar 07 '18 at 15:02
I've editet the question text to contain a difference definition, and stating the problem of two differently sized arrays. — Doe J, Mar 07 '18 at 15:05
`int arr3[tot_size];` is very bad. Forget about this style of definition. Use `int * arr3 = malloc( tot_size * sizeof(int) );` — i486, Mar 07 '18 at 15:07
@i486 You know that VLA are usable since C99 ? There is no need to malloc anymore, and this syntaxe is correct for c99 and superior (thougth I agree with you and usually avoid VLA, but it's personnal preference) — Tom's, Mar 07 '18 at 15:09
I'll google the advantages of VLA and malloc in this particular case so I learn. Thank you for your input. — Doe J, Mar 07 '18 at 15:16
@Tom's I know it. But I don't think it is good way to implement large array. — i486, Mar 07 '18 at 22:03

Tom's · Accepted Answer · 2018-03-07T17:05:27.360

If the difference between two array is the number in the first not in the second AND the number in the second not in the first, you can simply do the following :

create a result array, and copy the first and second array in.

arr1 = [3, 5, 7, 0]

arr2 = [1, 10, 5]

arr3 = [arr1, arr2] ==> [3, 5, 7, 0, 1, 10, 5]
Then, sort the array (using qsort, or any other sorting function)

arr3 = [0, 1, 3, 5, 5, 7, 10]
Finally, delete the number appearing more than once (the sorting step make it really easy in only one pass)

arr3 = [0, 1, 3, 7, 10]

After comment : So, the difference between arr1 and arr2 is number in arr1 not in arr2 ? Your first code make more sense.

You should make some function in order to make it easy for you.

Make an "IsNumberInArray" function

bool IsNumberInArray(int number, int *array, size_t arraySize)

I leave the implementation to you (if the array is sorted, you could implement an dichotomic search, else you can do an good old loop for).

Then, for each number in arr1, if IsNumberInArray(arr1[i], arr2, size2) is false, add arr1[i] in arr3.

Basically, it's nearly exactly what you do. Your problem lie in the "inversed" condition (is the number is in the second array ?) and "how to break" from the second loop easily. The function will provide that.

Note that since arr3 will only retain arr1 number which is not in arr2, arr3 size can be at max size1. That's why I firstly assumed you wanted uniq number in arr1 AND arr2, since tot_size was size1 + size2.

Usually, I don't give code for "easy" problem, because if you can't solve it by yourself, that mean you need practice and giving you the answer will not be usefull for you, but since sg7 did it, it's meaningless to hold it (and you can't use the room for now), so here an implementation of the algorithm :

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdbool.h>

bool IsNumberInArray(int number, int *array, size_t arraySize)
{
    for (size_t i = 0; i < arraySize; ++i) {
       if (array[i] == number) {
           return (true);
       }
    }

    return (false);
}

void DumpArrayContent(int *array, size_t arraySize, char *arrayName)
{
    printf("%s has %zu elements:\n", arrayName ? arrayName : "array", arraySize);               
    for (size_t i = 0; i < arraySize; ++i) {
        printf("%d ",array[i]);
    }
    printf("\n");    
}

int main(void)
{
    int arr1[] = {1,2,3,4,7,8,9};
    int arr2[] = {3,4,5};

    size_t s1 = sizeof(arr1)/sizeof(*arr1);
    size_t s2 = sizeof(arr2)/sizeof(*arr2);

    int    arr3[s1];
    int    s3 = 0;

    for (size_t i = 0; i < s1; ++i) {
        if (!IsNumberInArray(arr1[i], arr2, s2)) {
           arr3[s3] = arr1[i];
           s3++;
        }
    }

    DumpArrayContent(arr1, s1, "arr1");
    DumpArrayContent(arr2, s2, "arr2");
    DumpArrayContent(arr3, s3, "arr3");

    return 0;
}

I don't think there is a more "effective" implementation, since after compiler optimization, the resulting executable would be pretty identical. If there are not compiler optimization activated, sg7 code will be more "effective" since it's straigthforward (mine have function call). It's up to you to see which one you prefer.

Wow! What a clever solution! Thank you so much, I'll try to implement it. — Doe J, Mar 07 '18 at 15:08
Thanks ! Good luck with the implementation. Keep in mind that is mostly a naive solution, and this can be improved somehow, but I think it's a good first step. — Tom's, Mar 07 '18 at 15:12
Hm, I just remembered, I want the third array to contain every element in array1 that is not in array2, and not vica versa. So if array1 is [1, 2, 3], and arr2 is [3, 4, 5], then arr3 is [1, 2]. Do you have any suggestions on how to do this? — Doe J, Mar 07 '18 at 15:35
Sorry for the late answer Tom, and thank you again for the help. I'm trying to implement the function you suggested, but how would I compare an integer to an entire array? In my head, I would have to iterate over array2 as well, as I cannot compare an integer to a pointer (array). — Doe J, Mar 07 '18 at 16:11
The purpose of the "IsNumberInArray" function is to tell you if a number is in an array (return true) or not (return false). if arr1 = [1, 2, 3] and arr2 = [3, 4, 5], then "IsNumberInArray(arr1[0], arr2, size2)" should return false since arr1[0] (which is 1) is not in arr2 (which is [3, 4, 5]). On contrary, "IsNumberInArray(arr1[2], arr2, size2)" should return true, sinc arr1[2] (which is 3), is indeed in arr2 (which is [3, 4, 5]). Do you understand ? It's a really simple function (thougth you can improve it if you know that the array is sorted by doing a dichotomic search). — Tom's, Mar 07 '18 at 16:15
Yes, I understand the logic. Perhaps you could move this discussion to a chat? I cannot do it as my rep isn't high enough. — Doe J, Mar 07 '18 at 16:26
can you join https://chat.stackoverflow.com/rooms/166412/private-discussion-temporary ? It's my first time doing that, so I hope I haven't done anything wrong ... — Tom's, Mar 07 '18 at 16:33
You can't write on a romm unless you have 20 reputation. sg7's code is exactly what you want to do. The only difference with what I suggest you do is that sg7 directly search with a second for loop. Thus, he have to retain if the number is in the array or not (purpose of "found" variable). With the function, you could directly return and act. it a simple matter a view. — Tom's, Mar 07 '18 at 16:41
Haha Tom, I think you did everything fine, just a shame that I am unable to speak in a chat rom even if invited. But I agree, sg7's answer made it more clear, but I'll try to implement your suggestion as it seems more effective. Thank you for all your help Tom, I wish I could upvote you. :P — Doe J, Mar 07 '18 at 16:46
I've edited, again. I'm not sure this is the way that stackoverflow must be used, but ... — Tom's, Mar 07 '18 at 17:01

sg7 · Answer 2 · 2018-03-07T16:31:10.867

I want the third array to contain every element in array1 that is not in array2, and not vica versa. So if array1 is [1, 2, 3], and arr2 is [3, 4, 5], then arr3 is [1, 2].

Providing that array1 has already been processed to not contain the duplicates,
it looks like you need this:

#include<stdio.h>
#include<string.h>
#include<stdlib.h>

int main(void)
{
    size_t i,j,k;
    int s3;

    int arr1[] = {1,2,3,4,7,8,9};
    int arr2[] = {3,4,5};

    size_t s1 = sizeof(arr1)/sizeof(int);
    size_t s2 = sizeof(arr2)/sizeof(int);

    int arr3[s1];
    int e;
    int found = 0;
    k = 0;

    for(i=0; i<s1; i++)
    {
        e = arr1[i]; 
        found = 0;

        for(j=0; j<s2; j++){

           if(e == arr2[j])
           {
               found = 1;
               break;
           }
        }

        if(found == 0){
           arr3[k] = e;
           k++;
        }
    }


    printf("arr1 has %d elements:\n",s1);
    for(i=0;i<s1; i++)
    {
        printf("%d ",arr1[i]);
    }

    printf("\narr2 has %d elements:\n",s2);
    for(i=0;i<s2; i++)
    {
        printf("%d ",arr2[i]);
    }

    printf("\narr3 has %d elements:\n",k);               
    for(i=0;i<k; i++)
    {
        printf("%d ",arr3[i]);
    }

    return 0;
}

Output:

arr1 has 7 elements:                                                                                                                         
1 2 3 4 7 8 9                                                                                                                               
arr2 has 3 elements:                                                                                                                        
3 4 5                                                                                                                                       
arr3 has 5 elements:                                                                                                                        
1 2 7 8 9

Perfect, thank you so much. So when the "for loop" break, it returns to the start of the "j" loop, and increments j? So that the ith element of arr1 is compared to the next element of arr2? — Doe J, Mar 07 '18 at 16:34
@DoeJ When the loop breaks, it means that element from `arr1` exists in `arr2` and there is no need to add it to `arr3`. Then, the next element is taken form `arr1` and compared with all elements from `arr2`. When element is found the loop breaks, if not then the element is added to `arr3`. — sg7, Mar 07 '18 at 16:39
@DoeJ. To clarify, when the "for loop" breaks, program returns to take next element from the `i` loop. That next element `e` from `arr1` will be compared with all elements in `arr2` (unless duplicate is found and loop breaks). The loop `j` is always restarted from first element. The loop `i` always advances. — sg7, Mar 07 '18 at 16:52

score 1 · Answer 3 · edited Oct 02 '21 at 18:06

Look at your loop, notice that the inner for-loop j is initialized to 0 at every iteration of i.

  for(i = 0; i < size1; i++)
  {
    for(j = 0; j < size2; j++)
    {
      if(arr1[i] != arr2[j])
      {
        arr3[s] = arr1[i];
        ++i;
        ++j;  // so what does this do?
        ++s;
      }
    }
  }

Let's try to see what happens with two arrays that have different values:

arr1 : {1,2}
arr2 : {3,4}


           i   j   s
iteration  0   0   0  => arr3[0] = 1; 
           1   1   1
               2                        j==2 since j++, leaving inner loop j==size2
iteration  2   0   1                    i==2 since i++, leaving outer loop i==size1

Best is to write down your steps on a paper and go through your algorithm, start with a simple example, then create a prototype routine for it, if that works move on to larger arrays, different lenght arrays, arrays that are identical and so on.

How to find the difference between two arrays in C?

3 Answers3