2

I am looking for an efficient Python implementation of a function that takes a decimal formatted string, e.g.

2.05000
200
0.012

and returns a tuple of two integers representing the significand and exponent of the input in base-10 floating point format, e.g.

(205,-2)
(2,2)
(12,-3)

List comprehension would be a nice bonus.

I have a gut feeling that there exists an efficient (and possibly Pythonic) way of doing this but it eludes me...


Solution applied to pandas

import pandas as pd
import numpy as np
ser1 = pd.Series(['2.05000', '- 2.05000', '00 205', '-205', '-0', '-0.0', '0.00205', '0', np.nan])

ser1 = ser1.str.replace(' ', '')
parts = ser1.str.split('.').apply(pd.Series)

# remove all white spaces
# strip leading zeros (even those after a minus sign)
parts.ix[:,0] = '-'*parts.ix[:,0].str.startswith('-') + parts.ix[:,0].str.lstrip('-').str.lstrip('0')

parts.ix[:,1] = parts.ix[:,1].fillna('')        # fill non-existamt decimal places
exponents = -parts.ix[:,1].str.len()
parts.ix[:,0] += parts.ix[:,1]                  # append decimal places to digit before decimal point

parts.ix[:,1] = parts.ix[:,0].str.rstrip('0')   # strip following zeros

exponents += parts.ix[:,0].str.len() - parts.ix[:,1].str.len()

parts.ix[:,1][(parts.ix[:,1] == '') | (parts.ix[:,1] == '-')] = '0'
significands = parts.ix[:,1].astype(float)

df2 = pd.DataFrame({'exponent': exponents, 'significand': significands})
df2

Input:

0      2.05000
1    - 2.05000
2       00 205
3         -205
4           -0
5         -0.0
6      0.00205
7            0
8          NaN
dtype: object

Output:

   exponent  significand
0        -2          205
1        -2         -205
2         0          205
3         0         -205
4         0            0
5         0            0
6        -5          205
7         0            0
8       NaN          NaN

[9 rows x 2 columns]
ARF
  • 7,420
  • 8
  • 45
  • 72
  • wouldn't the first one be `(2.05, 0)` and the third one be `(1.2, -2)`? Why does `2.05` become `205` and `200` become `2`? – GP89 Nov 12 '14 at 14:49
  • @GP89 While (1.2,-2) is mathematically an equivalent representation, the whole idea of storing significant and exponent separately is that both can be stored as integers. Thus, for this application (12,-3) is the correct representation. Maybe http://en.wikipedia.org/wiki/Floating_point helps you understand this representation if I am doing a poor job of explaining. – ARF Nov 12 '14 at 14:52
  • Ok thanks, so is a requirement that it be stored as the smallest integer representation? (ie why `200` becomes `2`) – GP89 Nov 12 '14 at 14:53
  • That's the reason I was confused. In that representation result isn't and integer, but a number `1 <= x < 10` – GP89 Nov 12 '14 at 14:55
  • @GP89 I think you are confused by scientific notation. In science 205 would often be expressed as 2.05*10^2, in floating point storage however, 205 needs to be thought of as 205*10^0 so that significant and exponent are integers that can be stored efficiently. Does this help? – ARF Nov 12 '14 at 14:59
  • Ah I was being a bit dense. Sorry for that! – GP89 Nov 12 '14 at 15:14

4 Answers4

3

Take a look at decimal.Decimal:

>>> from decimal import Decimal
>>> s = '2.05000'
>>> x = Decimal(s)
>>> x
Decimal('2.05000')
>>> x.as_tuple()
DecimalTuple(sign=0, digits=(2, 0, 5, 0, 0, 0), exponent=-5)

Does almost what you need, just convert the DecimalTuple to your desired Format, for example:

>>> t = Decimal('2.05000').as_tuple()
>>> (''.join(str(x) for i,x in enumerate(t.digits) if any(t.digits[i:])),
... t.exponent + sum(1 for i,x in enumerate(t.digits) if not 
... any (t.digits[i:])))
('205', -2)

Just a sketch, but satisfies your three testcases.

You might want to .normalize() your Decimal before you process it .as_tuple() (thanks @georg), this takes care of trailing zeros. This way, you won't need to do that much formatting:

>>> Decimal('2.05000').normalize().as_tuple()
DecimalTuple(sign=0, digits=(2, 0, 5), exponent=-2)

So your function can be written as:

>>> def decimal_str_to_sci_tuple(s):
...  t = Decimal(s).normalize().as_tuple()
...  return (int(''.join(map(str,t.digits))), t.exponent)
... 
>>> decimal_str_to_sci_tuple('2.05000')
(205, -2)
>>> decimal_str_to_sci_tuple('200')
(2, 2)
>>> decimal_str_to_sci_tuple('0.012')
(12, -3)

(be sure to add t.sign when supporting negative numbers though).

ch3ka
  • 11,792
  • 4
  • 31
  • 28
2

If you are looking for scientific notation, you could use decimal and format as:

numbers = ['2.05000','200','0.01','111']
print ["{:.2E}".format(Decimal(n)) for n in numbers]

output:

['2.05E+0', '2.00E+2', '1.00E-2']

If you are looking for,

  1. Get the digit other than 0 in the right hand side
  2. Get the scientific notation till right hand side digit

    from decimal import  *
    numbers = ['2.05000','200','0.01','111']
    numbers = [ n.rstrip('0') if '.' in n else n  for n in numbers ] #strip right         zeros if found after .
    for n in numbers:
        if '.' in n:
            num = n.split('.')[0]
            dec = n.split('.')[1]
            tenthNumber = len(dec)
            print (Decimal(num+dec), -1 * tenthNumber)
        elif n.endswith('0'): 
            tenthNumber = 0
            revN = n[::-1]
            for i in range(len(revN)):
                if revN[i]=='0':
                    tenthNumber = tenthNumber + 1
                else:
                    break
            print (n[:(len(n)-tenthNumber)], str(tenthNumber))
    
        else:
            print (n,0)
    

Output:

(Decimal('205'), -2)
('2', '2')
(Decimal('1'), -2)
('111', 0)
venpa
  • 4,268
  • 21
  • 23
2

Here's a straight-forward string processing solution.

def sig_exp(num_str):
    parts = num_str.split('.', 2)
    decimal = parts[1] if len(parts) > 1 else ''
    exp = -len(decimal)
    digits = parts[0].lstrip('0') + decimal
    trimmed = digits.rstrip('0')
    exp += len(digits) - len(trimmed)
    sig = int(trimmed) if trimmed else 0
    return sig, exp

>>> for x in ['2.05000', '200', '0.012', '0.0']:
    print sig_exp(x)

(205, -2)
(2, 2)
(12, -3)
(0, 0)

I'll leave the handling of negative numbers as an exercise for the reader.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • Thanks for the answer. I love this solution if for no other reason than that it took me a second read through to understand the ingeneous copying, trimming & correction of the exponent resulting from the trailing digits of 2.05000... – ARF Nov 12 '14 at 17:58
  • Answer accepted as it can be implemented in vectorized form for pandas columns. Thanks again very much! – ARF Nov 17 '14 at 22:49
0

Here's one method using venpa's formatting string (as all credit goes to him) and starting with numbers instead of strings. If you can afford rounding the significand (e.g. after 2 digits), you could simply write:

def scd_exp(scnum):
    scnum = "{:.2e}".format(scnum)
    return (float(scnum[:4]),int(scnum[-3:]))


numbers = [2.05, 205, 0.0001576, 111]
for number in numbers:
    print(scd_exp(number))

result is

(2.05, 0)
(2.05, 2)
(1.58, -4)
(1.11, 2)

If you want to set the significand rounding by yourself each time you call the function (let's say to 6 digits for the example), you could write

def scd_exp(scnum, roundafter):
    formstr = "".join(("{:.",str(roundafter),"e}"))
    scnum = formstr.format(scnum)     
    return (float(scnum[:roundafter+2]),int(scnum[-3:]))


numbers = [2.05, 205, 0.000157595678, 111]
for number in numbers:
    print(scd_exp(number, 6))

which gives back

(2.05, 0)
(2.05, 2)
(1.575957, -4)
(1.11, 2)
Marc Steffen
  • 113
  • 1
  • 7