7

Using a microcontroller (PIC18F4580), I need to collect data and send it to an SD card for later analysis. The data it collects will have values between 0 and 1023, or 0x0 and 0x3FF.

So what I need to do is convert 1023 into a base 10 string of literal ASCII values (0x31, 0x30, 0x32, 0x33, ...).

My problem is that the only way I can think of to split the digits apart requires a lot of division.

char temp[4];
temp[0] = 1023 % 10;
temp[1] = (1023 % 100) / 10;
temp[2] = (1023 % 1000) / 100;
temp[3] = (1023 % 10000) / 1000;

Using this method, finding the ASCII values of an n digit decimal number requires 2n-1 divisions. Is there a method that would be faster?

The end goal of this is to wind up with a .csv file on the SD card that can quickly be plugged into any laptop to see a graph of the data in Excel.

Clifford
  • 88,407
  • 13
  • 85
  • 165
John Moffitt
  • 5,730
  • 7
  • 30
  • 39
  • You may want to add a null terminator, just in case. (Else, you may be relying on undefined behaviour.) – strager Sep 12 '10 at 07:44
  • @strager: Only if he were to be performing string operations on the data. Writing 4 bytes to the SD card will not require a terminator. – Clifford Sep 12 '10 at 07:47
  • @Clifford, Hence me writing "may". @John Moffitt, Do you absolutely *need* base 10? Can you settle with base 16 or base 256? – strager Sep 12 '10 at 07:49
  • @all, in this example, I would simply add 0x30 to each cell in the array and send it off the the SD card. – John Moffitt Sep 12 '10 at 07:56
  • Can you use `itoa`? http://en.wikipedia.org/wiki/Itoa (with some implementations, should you be missing it). Or `printf`/`sprintf`... – Merlyn Morgan-Graham Sep 12 '10 at 08:10
  • @Merlyn, The implementation of itoa given at that link is roughly what I did above, except it will work for any size int at the cost of one extra division. – John Moffitt Sep 12 '10 at 08:15
  • @Merlyn, If sprintf can convert an int into a null terminated "string" then that would be perfect as long as the magic behind it isn't even worse than a load of divisions. – John Moffitt Sep 12 '10 at 08:24
  • @Merlyn: Microchip's MPLAB C18 library supports itoa(): http://ww1.microchip.com/downloads/en/devicedoc/mplab_c18_libraries_51297f.pdf, but you can bet your life that it still performs % and / operations. That said it should be considered. sprintf() on the other hand is likely to have a large stack overhead as well. itoa() will be more efficient since it is more specialised. – Clifford Sep 12 '10 at 09:43
  • 1
    Do you want `temp` "reversed"? With the units digit at temp[0], the tens digit at temp[1], ...? 42 ==> temp[0] = '2'; temp[1] = '4'; temp[2] = '0'; temp[3] = '0'; – pmg Sep 12 '10 at 13:28

9 Answers9

15

The obvious solution is not to convert the data to ASCII at all but store it in binary format. That way all you need to worry about is the endianness of the data. If the system performing the later analysis is far more powerful than your embedded target, then it would make sense to let that deal with the conversion and and byte order.

On the other hand, it is possible that the execution time of the / and % is insignificant compared to the time taken to transfer the data to the SD card; so make sure that you are optimising the right thing.

Clifford
  • 88,407
  • 13
  • 85
  • 165
  • This. Stick a byte order mark at the start of the file and be happy. – Nicholas Knight Sep 12 '10 at 08:07
  • I considered this, but it is vastly preferred to be able to simply pull the SD card out and stick it into a laptop to see it in Excel. You're probably right about the insignificance of the divisions in relation to the transfer speed to the SD card, however The microcontroller will be pulling data from as many as 11 sensors at a time, and it will also need to convert a time value as well giving me roughly 200 devisions just to poll each sensor once. After that, I need to poll all the sensors as fast as possible. – John Moffitt Sep 12 '10 at 08:10
  • 1
    @John: 30 lines of C or C++ compiled to a Windows executable that sits on the SD card and converts to an Excel-friendly format. :) – Nicholas Knight Sep 12 '10 at 08:15
  • @Nick, I was so intent on making this portable and easy that I forgot that all of this would be taking place on an SD card. This is one solution – John Moffitt Sep 12 '10 at 08:27
  • 1
    A related option is putting them in as hexadecimal, where the conversion takes division by 16 -- which compiles as just a bit shift, not a "real" division. Then use Excel's HEX2DEC() conversion function on them. – Brooks Moses Sep 13 '10 at 02:19
  • Given the other listed constraints, this is the correct solution. – Joshua Sep 13 '10 at 02:34
  • @Brooks: That probably deserves posting as an answer rather than a comment. I would not necessarily choose that approach, but it has sufficient merit to be regarded ans an option. – Clifford Sep 13 '10 at 09:11
4

There's certainly a much faster way: have an array of 1024 pre-computed strings. Then you can just do bounds checking, followed by an index into the array.

It's unclear from your question whether your code is running on the microcontroller though. If that's the case, you may not have enough memory for this approach.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Yes, the code will be running on the microcontroller, however it has 32KB of flash memory. Such an array would use 4KB, correct? – John Moffitt Sep 12 '10 at 07:50
  • @John: With terminating zeros embedded it would be 5kb. If you wrapped it in a routine that appended the terminating null, or were using it in such a way that you didn't need the terminating null, then you could get away with 4kb. – Merlyn Morgan-Graham Sep 12 '10 at 08:13
  • @John: I think Clifford's is the best answer, but couldn't you just store the lookup table on the SD card? (I'm assuming here that you _do_ have enough _RAM_ to load it.) – Nicholas Knight Sep 12 '10 at 08:19
  • @Nick, I only have 1536B of RAM to work with :P Maybe I could run it through winrar. – John Moffitt Sep 12 '10 at 08:42
  • 3
    The array doesn't need terminating zeros for the four-digit numbers; you know nothing is more than four characters long. Also, you can save a kilobyte of memory if you compute the presence of the leading 1 and then use a lookup table for the remaining three digits. – Brooks Moses Sep 13 '10 at 02:22
  • @Brooks: There's some code to `compute the presence of the leading 1` in part of the answer I just added. – Merlyn Morgan-Graham Sep 13 '10 at 06:01
3

I agree with what Clifford said, that you shouldn't worry about optimizing it if you don't have to, and that you can push the log cleanup to your analysis platform, rather than worrying about formatting in an embedded application.

That being said, here's an article that might be useful to you. It uses a loop, shifts, additions and branches, with linear/constant complexity: http://www.johnloomis.org/ece314/notes/devices/binary_to_BCD/bin_to_bcd.html

Also, I thought it would be fun to make some code that doesn't perform any divides, multiplies, or branches, but still gives the correct answer [0 - 1024). No promises that this is any faster than other options. This sort of code is just an option to explore.

I'd love to see if anyone can provide some tricks to make the code smaller, require less memory, or require fewer operations, while keeping the rest of the counts equal, or shrinking them :)

Stats:

  • 224 bytes in constants (no idea on the code size)
  • 5 bit-shift-rights
  • 3 subtracts
  • 5 bitwise-ands
  • 4 bitwise-ors
  • 1 greater-than comparison

Perf:

Using the perf comparisons and itoa routines in Jonathan Leffler's answer, here are the stats I got:

  • Division 2.15
  • Subtraction 4.87
  • My solution 1.56
  • Brute force lookup 0.36

I increased the iteration count to 200000 to ensure I didn't have any problems with timing resolution, and had to add volatile to the function signatures so that the compiler didn't optimize out the loop. I used VS2010 express w/ vanilla "release" settings, on a 3ghz dual core 64 bit Windows 7 machine (tho it compiled to 32 bit).

The code:

#include "stdlib.h"
#include "stdio.h"
#include "assert.h"

void itoa_ten_bits(int n, char s[])
{
  static const short thousands_digit_subtract_map[2] =
  {
    0, 1000,
  };

  static const char hundreds_digit_map[128] =
  {
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
    3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
    4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
    5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
    6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
    7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
    8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
    9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
    0, 0, 0,
  };

  static const short hundreds_digit_subtract_map[10] =
  {
    0, 100, 200, 300, 400, 500, 600, 700, 800, 900,
  };

  static const char tens_digit_map[12] =
  {
    0, 1, 2, 3, 3, 4, 5, 6, 7, 7, 8, 9,
  };

  static const char ones_digit_map[44] =
  {
    0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
    0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
    0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
    0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
    0, 1, 2, 3
  };

  /* Compiler should optimize out appX constants, % operations, and + operations */
  /* If not, use this:
    static const char ones_digit_append_map[16] =
    {
      0, 6, 2, 8, 4, 10, 6, 12, 8, 14, 10, 16, 12, 18, 14, 20,
    };
  */
  static const char a1 = 0x10 % 10, a2 = 0x20 % 10, a3 = 0x40 % 10, a4 = 0x80 % 10;
  static const char ones_digit_append_map[16] =
  {
    0, a1, a2, a1 + a2,
    a3, a1 + a3, a2 + a3, a1 + a2 + a3,
    a4, a1 + a4, a2 + a4, a1 + a2 + a4,
    a3 + a4, a1 + a3 + a4, a2 + a3 + a4, a1 + a2 + a3 + a4,
  };

  char thousands_digit, hundreds_digit, tens_digit, ones_digit;

  assert(n >= 0 && n < 1024 && "n must be between [0, 1024)");
  /* n &= 0x3ff; can use this instead of the assert */

  thousands_digit = (n >> 3 & 0x7f) > 0x7c;
  n -= thousands_digit_subtract_map[thousands_digit];

  ones_digit = ones_digit_map[
    (n & 0xf)
      + ones_digit_append_map[n >> 4 & 0xf]
      + ones_digit_append_map[n >> 8 & 0x3]
    ];
  n -= ones_digit;

  hundreds_digit = hundreds_digit_map[n >> 3 & 0x7f];
  n -= hundreds_digit_subtract_map[hundreds_digit];

  tens_digit = tens_digit_map[n >> 3];

  s[0] = '0' | thousands_digit;
  s[1] = '0' | hundreds_digit;
  s[2] = '0' | tens_digit;
  s[3] = '0' | ones_digit;
  s[4] = '\0';
}

int main(int argc, char* argv)
{
  int i;
  for(i = 0; i < 1024; ++i)
  {
    char blah[5];
    itoa_ten_bits(i, blah);
    if(atoi(blah) != i)
      printf("failed %d %s\n", i, blah);
  }
}
Merlyn Morgan-Graham
  • 58,163
  • 16
  • 128
  • 183
  • `thousands_digit_low_mask`, `thousands_digit_high_mask`, and `thousands_digit_subtract_map` can be reduced to a `>` comparison and two `==` comparisons, respectively. `hundreds_digit_subtract_map` can optionally be reduced to a multiply. Also, it's probably better to use `+` instead of `|` when dealing with ASCII characters. This solution is pretty crazy, to say the least! – strager Sep 13 '10 at 08:03
  • @strager: Crazy is the point :) As for those tables, I was (maybe for no reason) trying to avoid branching, but it didn't occur to me at the time that simply getting the result of those comparisons might not cause a branch. Thanks for the tip! I also (maybe wrongly) assumed an add would be more expensive than a bitwise or. – Merlyn Morgan-Graham Sep 13 '10 at 17:27
2

With some care in finding the right number(s) to use, you can multiply by the reciprocal of the base rather than dividing by the base. Terje's code is for an x86, but porting the general idea to a PIC shouldn't be tremendously difficult.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • If I could decipher the logic behind this it might be promising. I will send that to one of my friends to see if he can wrap his head around it. – John Moffitt Sep 12 '10 at 08:57
2

There's a way of doing it using subtractions, but I am not convinced it's faster than using subtractions and modulus on a "normal" CPU (may be different in an embedded environment).

Something like this:

char makedigit (int *number, int base)
{
  static char map[] = "0123456789";
  int ix;

  for (ix=0; *number >= base; ix++) { *number -= base; }

  return map[ix];
}


char *makestring (int number)
{
  static char tmp[5];

  tmp[0] = makedigit(&number, 1000);
  tmp[1] = makedigit(&number, 100);
  tmp[2] = makedigit(&number, 10);
  tmp[3] = makedigit(&number, 1);
  tmp[4] = '\0';

  return tmp;
}

Then, a call to makestring() should result in a (static, so copy it before overwriting) string with the converted number (zero-prefixed, at 4 characters width, as the original assumption is a value in the 0-1023 range).

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
Vatine
  • 20,782
  • 4
  • 54
  • 70
  • It is not so an issue with "embedded environments" in gerneral, but rather that the nasty PIC18 instruction set has no DIV instruction, so these operatons take a large number of cycles. I am not sure that this solution would be faster in general, but it would vary depending on the magnitude of the digits; which may be an issue in some applications. – Clifford Sep 12 '10 at 20:32
  • This is (roughly) O(log n), with the 0-1023 span, you're looking at no more than 27 subtractions and no more than 28 comparisons (if I've done the numbers right). – Vatine Sep 12 '10 at 21:00
  • We ended up implementing this solution. – John Moffitt Oct 26 '10 at 15:46
  • Why the expensive `map[ix]` instead of `'0'+ix`? – R.. GitHub STOP HELPING ICE Apr 06 '15 at 06:47
  • @R.. (a) Not wanting to be bound by ASCII (the method posted works in EDCDIC as well as ASCII, PETSCII and any other character set) (b) I find it slightly readable (c) If this is a bottleneck, optimize. – Vatine Apr 09 '15 at 11:34
  • @Vatine: That's not assuming ASCII. The C language requires the digits `'0'` through `'9'` to have consecutive values, and of course all existing legacy encodings satisfy that requirement. – R.. GitHub STOP HELPING ICE Apr 09 '15 at 18:12
2

If the values are correctly in range (0..1023), then your last conversion is unnecessarily wasteful on the divisions; the last line could be replaced with:

temp[3] = 1023 / 1000;

or even:

temp[3] = 1023 >= 1000;

Since division is repeated subtraction, but you have a very special case (not a general case) division to deal with, I'd be tempted to compare the timings for the following code with the division version. I note that you put the digits into the string in 'reverse order' - the least significant digit goes in temp[0] and the most in temp[4]. Also, there is no chance of null-terminating the string given the storage. This code uses a table of 8 bytes of static data - considerably less than many of the other solutions.

void convert_to_ascii(int value, char *temp)
{
    static const short subtractors[] = { 1000, 100, 10, 1 };
    int i;
    for (i = 0; i < 4; i++)
    {
        int n = 0;
        while (value >= subtractors[i])
        {
            n++;
            value -= subtractors[i];
        }
        temp[3-i] = n + '0';
    }
}

Performance testing - Intel x86_64 Core 2 Duo 3.06 GHz (MacOS X 10.6.4)

This platform is probably not representative of your microcontroller, but the test shows that on this platform, the subtraction is considerably slower than the division.

void convert_by_division(int value, char *temp)
{
    temp[0] = (value %    10)        + '0';
    temp[1] = (value %   100) /   10 + '0';
    temp[2] = (value %  1000) /  100 + '0';
    temp[3] = (value % 10000) / 1000 + '0';
}

void convert_by_subtraction(int value, char *temp)
{
    static const short subtractors[] = { 1000, 100, 10, 1 };
    int i;
    for (i = 0; i < 4; i++)
    {
        int n = 0;
        while (value >= subtractors[i])
        {
            n++;
            value -= subtractors[i];
        }
        temp[3-i] = n + '0';
    }
}

#include <stdio.h>
#include <timer.h>
#include <string.h>

static void time_convertor(const char *tag, void (*function)(void))
{
    int r;
    Clock ck;
    char buffer[32];

    clk_init(&ck);
    clk_start(&ck);
    for (r = 0; r < 10000; r++)
        (*function)();
    clk_stop(&ck);
    printf("%s: %12s\n", tag, clk_elapsed_us(&ck, buffer, sizeof(buffer)));
}

static void using_subtraction(void)
{
    int i;
    for (i = 0; i < 1024; i++)
    {
        char temp1[4];
        convert_by_subtraction(i, temp1);
    }
}

static void using_division(void)
{
    int i;
    for (i = 0; i < 1024; i++)
    {
        char temp1[4];
        convert_by_division(i, temp1);
    }
}

int main()
{
    int i;

    for (i = 0; i < 1024; i++)
    {
        char temp1[4];
        char temp2[4];
        convert_by_subtraction(i, temp1);
        convert_by_division(i, temp2);
        if (memcmp(temp1, temp2, 4) != 0)
            printf("!!DIFFERENCE!! ");
        printf("%4d: %.4s %.4s\n", i, temp1, temp2);
    }

    time_convertor("Using division   ", using_division);
    time_convertor("Using subtraction", using_subtraction);

    time_convertor("Using division   ", using_division);
    time_convertor("Using subtraction", using_subtraction);

    time_convertor("Using division   ", using_division);
    time_convertor("Using subtraction", using_subtraction);

    time_convertor("Using division   ", using_division);
    time_convertor("Using subtraction", using_subtraction);

    return 0;
}

Compiling with GCC 4.5.1, and working in 32-bit, the average timings were (optimization '-O'):

  • 0.13 seconds using division
  • 0.65 seconds using subtraction

Compiling and working in 64-bit, the average timings were:

  • 0.13 seconds using division
  • 0.48 seconds using subtraction

Clearly, on this machine, using subtraction is not a winning proposition. You would have to measure on your machine to make a decision. And removing the modulo 10000 operation will only skew results in favour of the division (it knocks about 0.02 seconds off the time with division when replaced with the comparison; that's a 15% saving and worth having).

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Your routines seem to output and print values in reverse display order. Also, I added perf data for your routines, vs my solution in my answer. Thanks for the benchmark code :) – Merlyn Morgan-Graham Sep 14 '10 at 05:23
  • @Merlyn: yes, I output the bytes in the same inverted order that your sample code in the question does - putting the least significant digit in temp[0] and the most significant digit in temp[3]. It's somewhat weird, but the order matches what is in your sample code - all I did to change it is add the '0' to convert to a printable digit. Does the microcontroller have a built-in divide? If so, division may well be quicker than repeated subtraction. – Jonathan Leffler Sep 14 '10 at 05:30
  • @Merlyn: I've downloaded the data sheet for the PIC18F4580 and it seems that it has an 8x8 bit multiply, but no mention of division. AFAICS, it is basically an 8-bit chip - I don't see anything in the way of direct 16-bit addition or subtraction, but there is a 10-bit DAC. So, even handling numbers in the range 0..1023 seems to involve multiple operations. Or am I missing something? Anyway, a general purpose software division won't be much faster than the suggested subtraction - and might well be slower. If you use more data space, you can probably speed things up - space for time! – Jonathan Leffler Sep 14 '10 at 06:02
  • @Jonathan: I only provided an answer, not the original question, though I see now that your byte order matches the question. – Merlyn Morgan-Graham Sep 14 '10 at 06:08
  • @Merlyn: apologies - didn't look carefully enough. – Jonathan Leffler Sep 14 '10 at 06:14
  • @Jonathan Leffler, John Moffitt: No worries, none needed :) That datasheet info is probably pretty useful to the original inquirer, though. – Merlyn Morgan-Graham Sep 14 '10 at 06:30
  • @Jonathan: I've spent more time in that datasheet than I care to think about. The 10-bit DAC is indeed handled with multiple operations; two 8 bit registers are used to store the result. According to the block diagram there is no divide hardware, hence why I am hesitant about divides. I will be trying multiple implementations to see which gets me the best result, starting with Vatine's suggestion. – John Moffitt Sep 14 '10 at 08:31
1

Is there some reason that you're particularly concerned about this?

If your compiler and C library provide an itoa() function, use that, and then worry about writing this code (and associated tests and so forth to make sure you got it right!) if for some reason that turns out to be too slow or doesn't fit into RAM or something.

Brooks Moses
  • 9,267
  • 2
  • 33
  • 57
1

I've replaced my previous answer with a better one. This code creates a 4-character string in the proper order, most significant digit in output[0] to least significant in output[3] with a zero terminator in output[4]. I don't know anything about your PIC controller or C compiler, but this code doesn't require anything more than 16-bit integers, addition/subtraction, and shifting.

int x;
char output[5];
output[4] = 0;
x = 1023;
output[3] = '0' + DivideByTenReturnRemainder(&x);
output[2] = '0' + DivideByTenReturnRemainder(&x);
output[1] = '0' + DivideByTenReturnRemainder(&x);
output[0] = '0' + x;

The key to this is the magical function DivideByTenReturnRemainder. Without using division explicitly it's still possible to divide by powers of 2 by shifting right; the problem is that 10 isn't a power of 2. I've sidestepped that problem by multiplying the value by 25.625 before dividing by 256, letting integer truncation round down to the proper value. Why 25.625? Because it's easily represented by powers of 2. 25.625 = 16 + 8 + 1 + 1/2 + 1/8. Again, multiplying by 1/2 is the same as shifting right one bit, and multiplying by 1/8 is shifting right by 3 bits. To get the remainder, multiply the result by 10 (8+2) and subtract it from the original value.

int DivideByTenReturnRemainder(int * p)
{
    /* This code has been tested for an input range of 0 to 1023. */
    int x;
    x = *p;
    *p = ((x << 4) + (x << 3) + x + (x >> 1) + (x >> 3)) >> 8;
    return x - ((*p << 3) + (*p << 1));
}
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • Looks like this should be faster than Vatine's solution on average. I'm almost done with my code for this part, but once I am and I verify that it's working I will give this method a try. – John Moffitt Sep 16 '10 at 12:42
1

Are you required to use an ASCII string of the decimal representation? It would be much easier to store it in hexadecimal format. No division required, only (relatively cheap) shift operations. Excel should be able to read it if you prepend a '0x' to each number.

bta
  • 43,959
  • 6
  • 69
  • 99