What is the most efficient way to count distinct digits in an integer

Question

The number can range from 1 to 10¹⁵. I am using this code but it is running out of time.

int distinct(long long int a)
{
    int ele[10]={0},i,c=0;

    if(a==0) return 1;
    if(a<0) a=a*-1;

    while(a)
    {
        int t=a%10;
        ele[t]=1;
        a=a/10;
    }

    for (i=0;i<10;i++)
        if (ele[i])
            c++;

    return c;
}

You can slightly improve it using `c = c+ele[i]` instead of `if (ele[i]) c++`. — barak manos, Sep 14 '14 at 10:05
`if(a<0) a=a*-1;` not needed if "Number can Range form 1 to 10^15". — chux - Reinstate Monica, Sep 14 '14 at 10:44
`for (i=0;i<10;i++)` --> `i=10-1; do { ... } while (i-- > 0)` _might_ be a bit faster with test against 0 versus `i<10`. Also only 10 tests versus the present 11. — chux - Reinstate Monica, Sep 14 '14 at 10:49
*After* fixing the undefined behavior: if this code works *but* you want it faster, better, and/or stronger, it is off-topic for Stack Overflow and should be asked on [Code Review](http://codereview.stackexchange.com/). — Jongware, Sep 14 '14 at 10:55
"it is running out of time": that means that you are calling the function a huge number of times; maybe a better optimization lies outside the function, for instance if you need to do the count for regular sequences rather than random numbers. More context could be helpful. — , Sep 15 '14 at 22:41

chux - Reinstate Monica · Answer 1 · 2014-09-14T12:01:13.163

Incorporating various ideas and to resolve UB.

IMO, suspect there is something that OP has left out that is a significant cause of slowness.

// 1 to 10^15 only
int distinct_fast(long long int a) {
  int ele[10]={0},i,c=0;

  do {
    ele[a%10]=1;
    a /= 10;
  } while(a);

  i=10-1;
  do {
    c += ele[i]; // @barak manos
    }
  } while (i-- > 0);
  return c;
}

// entire unsigned long long range method 1
int distinct_complete1(unsigned long long int a) {
  ... // same code as above

// entire long long range method 2
int distinct_complete2(long long int a) {
  int ele[10]={0},i,c=0;

  // Use (-) numbers as there are more (or the same) number of (+) numbers
  if (a > 0) a = -a;   

  do {
    ele[-(a % 10)] = 1;
    a /= 10;
  } while(a);

  // same as above
  ...

Ideas for OP to explore:

unsigned char ele[10]={0};   // smaller flags

.

do {
  if (ele[a%10]++ == 0) c++;
  a /= 10;
} while(a);
// This eliminates need for following loop to add `ele[]`

.

// Invoke some strategy so when when a is small enough, 
// use `long` ops rather than `long long`
if (a > 1000000000) {
  for (i=6; i-- > 0; ) {
    if (ele[a%10]++ == 0) c++;
    a /= 10;
  } 
}
unsigned long b = a;  
do {
  if (ele[b%10]++ == 0) c++;
  b /= 10;
} while(b);

.

int distinct_complete3(unsigned long long int a) {
  unsigned char ele[10]={0};
  int c = 0;
  do {
    if (ele[a%10]++ == 0) c++;
    a /= 10;
  } while(a);
  return c;
}

score 0 · Answer 2 · 2014-09-16T07:26:20.950

Several possible optimizations:

you can trade a modulo for a multiply, usually much faster: q= a / 10; m= a - 10 * q;
you can avoid the final counting loop by packing all flags in a single integer, let mask; initialize it with mask= 0; every time you find a digit (m), flag it with mask|= (1 << m); in the end, the count will be given by bits[mask], where bits is a vector containing the precomputed counts for all integers from 0 to 1023=2^10-1.
```
int distinct(long long int a) 
{  
    int mask= 0;
    while (a)
    {
        int q= a / 10, m= a - 10 * q;
        mask|= 1 << m;
        a= q;
    }

    static short bits[1024]= { 0, 1, 1, 2, 1, 2, 2, 3, ...}; // Number of bits set
    return bits[mask];
}
```

Even better, you can work with digits in groups, say of three. Instead of converting to base 10, convert to base 1000. And for every base 1000 "digit", compute the corresponding mask that flags the constituent decimal digits (for instance, 535 yields the mask 1<<5 | 1<<3 | 1<<5 = 40).

This should be about three times faster. Anyway, some care of the leading zeroes should be added, for instance by providing a distinct array of masks for the leading triple (..1 vs 001).

    int distinct(long long int a) 
    {
        int mask= 0;
        while (true)
        {
            int q= a / 1000, m= a - 1000 * q;
            if (q == 0)
            {
                static short leading[1000]= { 1, 2, 4, 8, 16, 32, 64, ...}; // Mask for the leading triples
                mask|= leading[m];
                break;
            }
            else
            {
                static short triple[1000]= { 1, 3, 5, 9, 17, 33, 65, ...}; // Mask for the ordinary triples
                mask|= triple[m];
                a= q;
            }
        }

        static short bits[1024]= { 0, 1, 1, 2, 1, 2, 2, 3, ...}; // Number of bits set 
        return bits[mask];
    }

Use static arrays to make sure they are loaded once for all.

What is the most efficient way to count distinct digits in an integer

2 Answers2