-3

I'm trying to write string\file entropy calculator. Here is code I wrote but it doesn't work:

double entropy(char* buf)
{
    int*   rgi = (int*)_alloca(256);
    int*   pi  = rgi + 256;
    double H   = 0.0;
    double cb  = sizeof(buf);

    for (int i = sizeof(buf); --i >= 0;)
    {
        rgi[buf[i]]++;
    }

    while (--pi >= rgi)
    {
        if (*pi > 0)
        {
            H += *pi * log2(*pi / cb);
        }
    }

    return -H / cb;
}

What am I doing wrong?

meaning-matters
  • 21,929
  • 10
  • 82
  • 142
  • 3
    *What I do wrong?* - This: [How To Ask](https://stackoverflow.com/help/how-to-ask) – Eugene Sh. Jan 08 '18 at 18:26
  • 2
    `sizeof(buf)` gives you the size of pointer, not the size of buffer. Also buffer contains signed values so `rgi[buf[i]]` will be wrong even if `i` contained correct value. – user7860670 Jan 08 '18 at 18:26
  • 3
    Welcome to stackoverflow.com. Please take some time to read [the help pages](http://stackoverflow.com/help), especially the sections named ["What topics can I ask about here?"](http://stackoverflow.com/help/on-topic) and ["What types of questions should I avoid asking?"](http://stackoverflow.com/help/dont-ask). Also please [take the tour](http://stackoverflow.com/tour) and [read about how to ask good questions](http://stackoverflow.com/help/how-to-ask). Lastly please learn how to create a [Minimal, Complete, and Verifiable Example](http://stackoverflow.com/help/mcve). – Some programmer dude Jan 08 '18 at 18:27
  • 2
    Also [learn how to debug your programs](https://ericlippert.com/2014/03/05/how-to-debug-small-programs/) – Some programmer dude Jan 08 '18 at 18:27
  • @VTT How do you know `buf` contain signed values? – Support Ukraine Jan 08 '18 at 18:32
  • 2
    AFAIK `_alloca` does not zero the allocated memory, yet you are incrementing elements of it. – Weather Vane Jan 08 '18 at 18:33
  • @4386427 because `buf` is a pointer to `char` and `char` is typically signed. – user7860670 Jan 08 '18 at 18:35
  • 1
    Should `int* rgi = (int*)_alloca(256);` be `int* rgi = (int*)_alloca(256 * sizeof *rgi);` ? It looks wrong to me – Support Ukraine Jan 08 '18 at 18:36
  • @VTT - yeah... typically but not by standard. I think the `sizeof(buf)` is the problem as you also mentioned – Support Ukraine Jan 08 '18 at 18:36
  • What *does* it do? There are an infinite number of ways a piece of code can not work. – Daniel H Jan 08 '18 at 18:50

1 Answers1

3

I think you have 4 problems

1) The allocated memory is never initialized

2) Too little memory is allocated as you only allocate 1 byte for each integer

3) Use of char for buf may be a problem as char may be signed

4) sizeof(buf) gives you the size of a char pointer but not the size of the buffer

Besides that I think you make the code too complicated by iterating backwards.

Try this:

double entropy(unsigned char* buf, size_t bufsize)
{
    int*   rgi = (int*)_alloca(256 * sizeof *rgi);
    memset(rgi, 0, 256 * sizeof *rgi);
    double H   = 0.0;
    double cb  = bufsize;

    for (size_t i = 0; i < bufsize; ++i)
    {
        rgi[buf[i]]++;
    }

    for (int i = 0; i < 256; ++i)
    {
        if (rgi[i] > 0)
        {
            H += rgi[i] * log2(rgi[i] / cb);
        }
    }

    return -H / cb;
}
Support Ukraine
  • 42,271
  • 4
  • 38
  • 63
  • Thanks a lot for explanation. Your code works well with files but returns incorrect values with strings. For example, `char str[] = "1223334444"; printf("%.3f\n", entropy(str, sizeof(str));` returns value over 2.xxx but correct value for this is 1.846. – Jim Dawson Jan 09 '18 at 08:25
  • 1
    @JimDawson - That's because you call the function incorrectly. The `sizeof(str)` is wrong. Use `strlen(str)` instead. The `sizeof` will include the string termination whereas `strlen` doesn't. – Support Ukraine Jan 09 '18 at 08:32