Check if two numbers have same digits

Question

I need to write a program which will check if two numbers have same digits. For example:

a = 4423, b = 2433;

Even though their digits don't appear the same amount if times, they have same digits.

#include <stdio.h>
#include <stdlib.h>
int same_digit(int a, int b) {
  int n = abs(a), i;
  int count[10];
  for (i = 0; i < 10; ++i) {
    count[i] = 0;
  }
  while (n > 0) {
    count[n % 10]++;
    n = n / 10;
  }
  n = abs(b);
  while (n > 0) {
    count[n % 10]--;
    n = n / 10;
  }
  for (i = 0; i < 10; ++i)
    if (count[i] != 0)
      return 0;
  return 1;
}
int main() {
  int a, b;
  scanf("%d", &a);
  scanf("%d", &b);
  if (same_digit(a, b)) printf("Yes");
  else printf("No");
  return 0;
}

Problem with my code is that this literally checks if they have same digits. How could I modify this code to return 1 if all digits from a are present in b, and all digits from b are present in a, no matter how many times they are present.

Convert each number to a sorted string of unique digits that occur. Compare the two strings. One function called twice for generating the sorted string. Both 4233 and 2433 should generate 234, which will compare equal. The "sorting" is not a full-scale quicksort — you have an array of the count of each digit 0..9, and you simply copy the non-zero items in sequence to the output string. — Jonathan Leffler, Feb 10 '22 at 20:50
Create an array of `10` elements. For each digit `i` found in first number, mark the `i`'th element in the array. Then for each digit found in second number, check the corresponding array element is marked — Eugene Sh., Feb 10 '22 at 20:51
Note my [comment](https://stackoverflow.com/questions/71072017/check-if-two-numbers-have-same-digits/71072283#comment125637955_71072383) below. Tha algorithm outlined by [Eugene Sh.](https://stackoverflow.com/users/4253229/eugene-sh) in their [comment](https://stackoverflow.com/questions/71072017/check-if-two-numbers-have-same-digits/71072283#comment125637315_71072017) checks that every digit in the second number appears in the first, but does not verify that every digit present in the first appears in the second. If the two numbers are 1234 and 123, the algorithm says "yes" instead of "no". — Jonathan Leffler, Feb 10 '22 at 22:27

score 3 · Answer 1 · edited Feb 15 '22 at 11:54

Using arrays is not necessary, but it sure becomes clean code:

// Remove duplicate characters from sorted array
void remove_dups(char *str) 
{
    char *cur = str;
    while(*cur) 
    {
        *str = *cur;
        while(*++cur == *str);
        ++str;
    }
    *str = 0;
}

int cmp(const void *a, const void *b)
{
    return *(char*)a - *(char*)b;
}

int same_digit(int a, int b)
{
    char A[22], B[22]; // Room for 64 bit. Increase if necessary.
    char Aun[11], Bun[11]; // 10 digits
    
    sprintf(A, "%d", a);
    sprintf(B, "%d", b);
    
    qsort(A, strlen(A), sizeof *A, cmp);
    qsort(B, strlen(B), sizeof *B, cmp);
    
    remove_dups(A);
    remove_dups(B);

    return !strcmp(A, B);
}

Demo

Many C coders will frown on this. And it's not completely without good reason. This is how you typically would solve things in Python or Javascript. If you're writing serious C code, there's probably a reason to why you're not choosing a more high level language. And it's quite likely that those reasons has to do with the performance C can offer.

However, there's absolutely nothing that prevents you from starting with this and optimize it later when you have concluded that this code indeed is a bottleneck with a profiler. And who knows? It may be the case that this is faster. Don't do premature optimization.

There are actually many problems involving digits that becomes so much easier if you convert the numbers to arrays of digits first. And many problems involving arrays becomes much easier if you sort the arrays. Don't be afraid to have this approach as a standard tool in your toolbox.

Furthermore, it's quite common that a naive solution for unsorted arrays is O(n²) while a naive solution for a sorted array is just O(n). And since sorting can be done in O(n * log n) or better, you often get a pretty good solution. Sure, since n in this case typically is less than 10, so a naive O(n²) is probably faster anyway. But it's worth remembering. I believe that you would have to write pretty fancy code to solve this in O(n * log n) without arrays. If it's at all possible. (Not saying that it's likely not possible. Only that I don't know)

How does this deal with one number having three of some digit and the other having just one of the same digit (e.g. `111234` and `4321`)? They contain the same digits, but the strings produced aren't going to be equal. — Jonathan Leffler, Feb 10 '22 at 22:01
@JonathanLeffler You're absolutely correct. I misread the question. I'll fix it later today. — klutt, Feb 11 '22 at 03:44

Aki Suihkonen · Answer 2 · 2022-02-11T10:41:35.133

A histogram of 10 digits (or 11 if we want to count the '-' sign) where each frequency is capped between 0 and 1 should be implemented as a vector of bits (a bitmask) stored in an integer. Then of course the solution can handle all the number bases only up to 32, but the overall code should be vastly clearer and smaller, without the need to compare arrays by looping.

int hash_int(int b) {
    int hash = 0;
    // take abs as unsigned -- now 0 <= a <= 0x80000000u
    // unsigned mod 10 is btw more performant than signed mod
    unsigned int a = b >= 0 ? b : 0u - b;
    // construct the hash, removing duplicate digits as we go
    // performance wise `do {} while` is better than `while`
    // due to less jumping around - here the side effect is
    // that a==0 hashes to 1.
    // For comparison a==0 -> hash = 0 would work equally well
    do {
        hash |= 1 << (a % 10);
        a /= 10;
    } while (a);
    return hash;
}

bool is_same(int a, int b) { return hash_int(a) == hash_int(b); }

BTW, this method extends also to full histogram counting of more than 1 element per digit for checking palindromes: consider `uint64_t h=0ull; do { h += 1ull << (4 * (a % 10)); } while (a /= 10);` Here one can count 15 same digits without overflow -- the hash of `12334` would be 0x12110. — Aki Suihkonen, Feb 11 '22 at 09:55

Craig Estey · Accepted Answer · 2022-02-11T03:47:47.140

Your code is a bit more complicated than it needs to be.

Just compute a "has a digit" vector for each number (e.g. each element is 1 if the corresponding digit is in the number). This is a histogram/frequency table except that instead of the count of the digits in a number, it's just 0 or 1.

Then, compare the vectors for equality.

Here's some refactored code:

#include <stdio.h>
#include <stdlib.h>

void
count(int x,int *hasdig)
{
    int dig;

    // handle negative numbers
    if (x < 0)
        x = -x;

    // special case for zero
    // NOTE: may not be necessary
#if 1
    if (x == 0)
        hasdig[0] = 1;
#endif

    for (;  x != 0;  x /= 10) {
        dig = x % 10;
        hasdig[dig] = 1;
    }
}

int
same_digit(int a, int b)
{
    int dig_a[10] = { 0 };
    int dig_b[10] = { 0 };
    int dig;
    int same = 1;

    count(a,dig_a);
    count(b,dig_b);

    for (dig = 0;  dig < 10;  ++dig) {
        same = (dig_a[dig] == dig_b[dig]);
        if (! same)
            break;
    }

    return same;
}

int
main()
{
    int a, b;

    scanf("%d", &a);
    scanf("%d", &b);

    if (same_digit(a, b))
        printf("Yes\n");
    else
        printf("No\n");

    return 0;
}

UPDATE:

Also beware UB if either or both numbers is INT_MIN

Yes, -INT_MIN is still negative :-(

I've come up with an alternate way to deal with it. However, IMO, [almost] any approach seems to be extra work, just to make one special case work. So, I've kept the original [faster/simpler] code above

And, I've added some extra test/debug code.

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

int opt_t;

void
count(int x,int *hasdig)
{
    int dig;

    // handle negative numbers
    if (x < 0)
        x = -x;

    // special case for zero
    // NOTE: may not be necessary
#if 1
    if (x == 0)
        hasdig[0] = 1;
#endif

    for (;  x != 0;  x /= 10) {
        dig = x % 10;

        // my -INT_MIN fix ...
        if (dig < 0)
            dig = -dig;

        hasdig[dig] = 1;
    }
}

int
same_digit(int a, int b)
{
    int dig_a[10] = { 0 };
    int dig_b[10] = { 0 };
    int dig;
    int same = 1;

    count(a,dig_a);
    count(b,dig_b);

    for (dig = 0;  dig < 10;  ++dig) {
        same = (dig_a[dig] == dig_b[dig]);
        if (! same)
            break;
    }

    return same;
}

void
dotest(int a,int b)
{

    printf("a=%d b=%d -- %s\n",
        a,b,same_digit(a,b) ? "Yes" : "No");
}

int
main(int argc,char **argv)
{
    char *cp;

    --argc;
    ++argv;

    for (;  argc > 0;  --argc, ++argv) {
        cp = *argv;
        if (*cp != '-')
            break;

        if ((cp[1] == '-') && (cp[2] == 0)) {
            --argc;
            ++argv;
            break;
        }

        cp += 2;
        switch(cp[-1]) {
        case 't':
            opt_t = ! opt_t;
            break;
        }
    }

    do {
        int a, b;

        if (opt_t) {
            dotest(4423,2433);
            dotest(INT_MIN,1234678);
            dotest(INT_MIN,INT_MIN);
            dotest(INT_MAX,INT_MAX);
            break;
        }

        if (argc > 0) {
            for (;  argc > 0;  --argc, ++argv) {
                cp = *argv;
                if (sscanf(cp,"%d,%d",&a,&b) != 2)
                    break;
                dotest(a,b);
            }
            break;
        }

        scanf("%d", &a);
        scanf("%d", &b);
        dotest(a,b);
    } while (0);

    return 0;
}

This is one of the times when a `do { … } while (…);` loop works well: `do { dig = x % 10; hasdig[dig] = 1; x /= 10; } while (x > 0);` — you then don't need the `#if`/`#endif` code. You can also use: `do { dig = x % 10; hasdig[dig] = 1; } while ((x /= 10) > 0);` of course, or even: `do { hasdig[x % 10] = 1; } while ((x /= 10) > 0); — Code Golf to the fore! — Jonathan Leffler, Feb 10 '22 at 23:35
Also beware UB if either or both numbers is `INT_MIN`. The simplest way around that problem is to use: `if (x > 0) x = -x;` to ensure the value is negative, and then use `hasdig[-(x % 10)] = 1;` in the loop. This is OK for 2's complement systems where if the LHS of a `%` operator is negative and the RHS is positive, the result is negative. It works because `-INT_MAX` is representable as an `int`, but `-INT_MIN` is not (again, on 2's complement systems). — Jonathan Leffler, Feb 10 '22 at 23:42
@JonathanLeffler Personally, I only use `do { } while (0)` inside macros and as a trick to tame long `if/else` ladders. Thanks for the `INT_MIN` heads up. I _had_ tested it in a newer version that I didn't post--it passed. After your cmt, I retested and got UB/segfault when it used both args of `INT_MIN`. I left in the `dig = x % 10;`, originally just for teaching purposes. Now my [different] `INT_MIN` fix needs it. It's probably slower than yours because the negation inside the loop is conditional, but I wanted to try it anyway. Although, I think `-O2` can eliminate the branching — Craig Estey, Feb 11 '22 at 04:05

Andreas Wenzel · Answer 4 · 2022-02-10T22:17:46.083

You need to store which digits have occurred in both numbers. For example, for both numbers, you could define an array of 10 bool elements, where the first element specifies whether the digit 0 occurred, the second element whether the digit 1 occurred, etc. Since you need two such arrays, you could make an array of 2 of these arrays, giving you a 2D array. That way, you can process both arrays in the same loop.

After you have filled both arrays, you can compare them, in order to determine whether both numbers use the same digits.

Also, it may be better to change the return type of same_digit to bool.

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

bool same_digit( int a, int b )
{
    int numbers[2] = { a, b };
    bool occurred[2][10] = { {false} };

    //determine which digits are used by the individual numbers
    //and fill the two arrays accordingly
    for ( int i = 0; i < 2; i++ )
    {
        int n = abs( numbers[i] );

        while ( n > 0 )
        {
            occurred[i][n%10] = true;
            n = n / 10;
        }
    }

    //determine whether both numbers use the same digits
    for ( int i = 0; i < 10; i++ )
        if ( occurred[0][i] != occurred[1][i] )
            return false;

    return true;
}

//the following code has not been modified
int main() {
  int a, b;
  scanf("%d", &a);
  scanf("%d", &b);
  if (same_digit(a, b)) printf("Yes");
  else printf("No");
  return 0;
}

There is no need to have two arrays. One can check the array obtained from the first number when traversing the second number. — Eugene Sh., Feb 10 '22 at 21:21
You need to check, @EugeneSh, that each digit in the first number appears in the second and that each digit in the second appears in the first. If you're not careful, you could end up with 1234 as the first number and 123 as the second, and each digit in the second appears in the first, but 4 does not appear in the second. — Jonathan Leffler, Feb 10 '22 at 21:26
@JonathanLeffler Ah, yeah, it is a nuance I did not consider. — Eugene Sh., Feb 10 '22 at 21:28

score 0 · Answer 5 · edited Feb 11 '22 at 09:49

0

Here is a simpler version of Aki Suihkonen's solution that works for all int values including INT_MIN:

#include <stdlib.h>

int digit_set(int a) {
    int set = 0;
    // construct the set, one bit per digit value.
    // do / while allows for digit_set(0) = 1
    // integer division truncates towards 0 so the
    // remainder is the negative value of the low digit
    // hence abs(a % 10) is the digit value.
    // this works for all values including INT_MIN
    do {
        set |= 1 << abs(a % 10);
    } while ((a /= 10) != 0);

    return set;
}

bool same_digits(int a, int b) { return digit_set(a) == digit_set(b); }

edited Feb 11 '22 at 09:49

Aki Suihkonen

19,144
1
36
57

answered Feb 11 '22 at 09:11

chqrlie

131,814
10
121
189

1

Took the liberty of spelling my name correctly. It's a bit subjective if taking `abs` O(log N) times vs O(1) times makes this simpler... – Aki Suihkonen Feb 11 '22 at 09:51
@AkiSuihkonen: sorry about the typo, the source code is simpler and handles `INT_MIN` correctly. Regarding performance, `abs()` is usually expanded inline into branchless code and the modulo and division by 10 compile to a single multiplication and some shifts on x86_64 as can be seen on [Godbolt Compiler Explorer](https://godbolt.org/z/qc57TGq1E) – chqrlie Feb 11 '22 at 10:06

Check if two numbers have same digits

5 Answers5