3

I'm working on a microcontroller that does not support floating point math. Integer math only. As such, there is no sqrt() function and I can't import any math modules. The MCU is running a subset of python that supports eight Python data types: None, integer, Boolean, string, function, tuple, byte list, and iterator. Also, the MCU can't do floor division (//).

My problem is that I need to calculate the magnitude of 3 signed integers.

mag = sqrt(x**2+y**2+z**2)

FWIW, the values can only be in the range of +/-1024 and I just need a close approximation. Does anyone have a pattern for solving this problem?

cce1911
  • 363
  • 3
  • 20
  • so you need the closest int to the `sqrt(x**2+y**2+z**2)`? – Joran Beasley Jul 04 '16 at 04:48
  • 3
    What - precisely - does "close" mean? How important is speed? Can you use a table of pre-computed constants, and if so how many bytes of pre-computed data can you afford? What _does_ the MCU support - Integer addition? Right shift? Integer multiplication? And on what size integers? – Tim Peters Jul 04 '16 at 04:48
  • 5
    Do you *really* need the magnitude, or does using the magnitude-squared directly work? – o11c Jul 04 '16 at 05:03
  • ahhh good question @o11c ... – Joran Beasley Jul 04 '16 at 05:04
  • 1
    Does it support division at all? The floor division operator is a recent addition to distinguish from real division, a pointless distinction if only integers are supported. But division is frequently expensive in the first place; it's not unlikely a successive approximation using shifts, additions and multiplications is cheaper, which might in turn benefit from operations like ctz or clrsb to find the most significant 1 bit. I'm rather curious what Python-like system you're using. PyMite perhaps? – Yann Vernier Jul 04 '16 at 05:27
  • The device is a Synapse Wireless radio with a ATMEL ATmega128 chip as the MCU. In order to keep the footprint as small as possible only a subset of python is supported. this leaves more room for application code. I'm doing tap detection using an ADXL345 3 axis accelerometer. I'm trying to quantify/compare the magnitude of a remote impact event. When a "tap" is detected, I read the adxl345 to get x,y and z values. – cce1911 Jul 04 '16 at 12:48
  • @TimPeters, I don't need precision since I'm simply trying to compare the accel readings to the "tap threshold". Speed is not important, these events take place seconds apart. The MCU supports normal python mathematical operations that apply to integers. From the manual "Note that the division is integer division, taking the “floor” value of the division. 999 / 1000 = 0. The Python “floor” operator (//) is not implemented." and "The python power operator (**) is not implemented". It does support shifting left and right. – cce1911 Jul 04 '16 at 12:57

2 Answers2

3

Note that the largest possible sum is 3*1024**2, so the largest possible square root is 1773 (floor - or 1774 rounded).

So you could simply take 0 as a starting guess, and repeatedly add 1 until the square exceeds the sum. That can't take more than about 1770 iterations.

Of course that's probably too slow. A straightforward binary search can cut that to 11 iterations, and doesn't require division (I'm assuming the MCU can shift right by 1 bit, which is the same as floor-division by 2).

EDIT

Here's some code, for a binary search returning the floor of the true square root:

def isqrt(n):
    if n <= 1:
        return n
    lo = 0
    hi = n >> 1
    while lo <= hi:
        mid = (lo + hi) >> 1
        sq = mid * mid
        if sq == n:
            return mid
        elif sq < n:
            lo = mid + 1
            result = mid
        else:
            hi = mid - 1
    return result

To check, run:

from math import sqrt
assert all(isqrt(i) == int(sqrt(i)) for i in range(3*1024**2 + 1))

That checks all possible inputs given what you said - and since binary search is notoriously tricky to get right in all cases, it's good to check every case! It doesn't take long on a "real" machine ;-)

PROBABLY IMPORTANT

To guard against possible overflow, and speed it significantly, change the initialization of lo and hi to this:

    hi = 1
    while hi * hi <= n:
        hi <<= 1
    lo = hi >> 1

Then the runtime becomes proportional to the number of bits in the result, greatly speeding smaller results. Indeed, for sloppy enough definitions of "close", you could stop right there.

FOR POSTERITY ;-)

Looks like the OP doesn't actually need square roots at all. But for someone who may, and can't afford division, here's a simplified version of the code, also removing multiplications from the initialization. Note: I'm not using .bit_length() because lots of deployed Python versions don't support that.

def isqrt(n):
    if n <= 1:
        return n
    hi, hisq = 2, 4
    while hisq <= n:
        hi <<= 1
        hisq <<= 2
    lo = hi >> 1
    while hi - lo > 1:
        mid = (lo + hi) >> 1
        if mid * mid <= n:
            lo = mid
        else:
            hi = mid
    assert lo + 1 == hi
    assert lo**2 <= n < hi**2
    return lo

from math import sqrt
assert all(isqrt(i) == int(sqrt(i)) for i in range(3*1024**2 + 1))
Tim Peters
  • 67,464
  • 13
  • 126
  • 132
  • this suffers the same as mine ... except it always rounds up instead of down ... (ie 10.01 -> 11) but +1 as this is plenty fast – Joran Beasley Jul 04 '16 at 05:31
  • In the absence of a definition for "close" ... ;-) But a binary search can be coded to compute the floor of the true sqrt, or the ceiling, or to pick the `i` such that `i**2` is closest to the input `n`. For example, if it "naturally" produces the floor, then to get the ceiling instead just add `if i*i != n: i += 1` at the end. – Tim Peters Jul 04 '16 at 05:41
  • @TimPeters, I was wrong about the range. The adxl345 stores the accel data in 2 eight bit registers for each axis. I have to shift the msbyte<<8 then | the lsbyte. The MCU only supports signed 16 bit integers, (+/-32768) – cce1911 Jul 04 '16 at 13:49
  • Thanks for taking the time to help me. I'll mark the answer as correct. – cce1911 Jul 05 '16 at 13:46
  • @cce1911, if you missed the comment on a deleted answer: checking whether `sqrt(n) > t` is the same as checking whether `n > t*t`. So you don't need square roots at all. If your threshold is `t`, just compare `x**2 + y**2 + z**2` to `t**2`. – Tim Peters Jul 05 '16 at 17:09
1

there is a algorithm to calculate it, but it use floor division, without that this is what come to my mind

def isqrt_linel(n):
    x = 0
    while (x+1)**2 <= n:
        x+=1
    return x

by the way, the algorithm that I know use the Newton method:

def isqrt(n):
    #https://en.wikipedia.org/wiki/Integer_square_root
    #https://gist.github.com/bnlucas/5879594
    if n>=0:
        if n == 0:
            return 0
        a, b = divmod(n.bit_length(), 2)
        x = 2 ** (a + b)
        while True:
            y = (x + n // x) >> 1
            if y >= x:
                return x
            x = y
    else:
        raise ValueError("negative number")
Copperfield
  • 8,131
  • 3
  • 23
  • 29