3

As a workaround to align floats to decimal separator for tabular numeric data, I tried to find a regex to replace (globally a posteriori) trailing zeros with spaces, with the following rules:

  1. no trailing zeros after a decimal digit
  2. if the first digit after decimal separator is zero, keep it

Due also to Python regex engine limitation on look-behind requiring fixed-width pattern, I wasn't able to find a satisfactory solution. Here is a working example of my tries (Python 3.x); do not rely on vertical bars in your solution, they are in the example just for clarity purpose:

import re
# formatmany is just a way to speed up building of multiline string of tabular data
formatmany=lambda f:lambda *s:'\n'.join(f.format(*x) for x in s)

my_list = [[12345, 12.345, 12.345, 12.345],
           [12340, 12.34 , 12.34 , 12.34 ],
           [12345, 12.005, 12.005, 12.005],
           [12340, 12.04 , 12.04 , 12.04 ],
           [12300, 12.3  , 12.3  , 12.3  ],
           [12000, 12.0  , 12.0  , 12    ]]
my_format = formatmany('|{:8d}|{:8.2f}|{:8.3f}|{:8.4f}|')
my_string = my_format(*my_list) # this is the formatted multiline string with trailing zeros

print('\nOriginal string:\n')
print(my_string)
print('\nTry 1:\n')
print(re.sub(r'(?<!\.)0+(?=[^0-9\.]|$)',lambda m:' '*len(m.group()),my_string))
print('\nTry 2:\n')
print(re.sub(r'(\d)0+(?=[^\d]|$)',r'\1',my_string))

which prints

Original string:

|   12345|   12.35|  12.345| 12.3450|
|   12340|   12.34|  12.340| 12.3400|
|   12345|   12.01|  12.005| 12.0050|
|   12340|   12.04|  12.040| 12.0400|
|   12300|   12.30|  12.300| 12.3000|
|   12000|   12.00|  12.000| 12.0000|

Try 1:

|   12345|   12.35|  12.345| 12.345 |
|   1234 |   12.34|  12.34 | 12.34  |
|   12345|   12.01|  12.005| 12.005 |
|   1234 |   12.04|  12.04 | 12.04  |
|   123  |   12.3 |  12.3  | 12.3   |
|   12   |   12.0 |  12.0  | 12.0   |

Try 2:

|   12345|   12.35|  12.345| 12.345|
|   1234|   12.34|  12.34| 12.34|
|   12345|   12.01|  12.005| 12.005|
|   1234|   12.04|  12.04| 12.04|
|   123|   12.3|  12.3| 12.3|
|   12|   12.0|  12.0| 12.0|

Try 1 replace trailing zeros also in integers, try 2 was taken from another solution for replacing trailing zeros in a single float. Both are unsatisfactory, since the desired output should be:

|   12345|   12.35|  12.345| 12.345 |
|   12340|   12.34|  12.34 | 12.34  |
|   12345|   12.01|  12.005| 12.005 |
|   12340|   12.04|  12.04 | 12.04  |
|   12300|   12.3 |  12.3  | 12.3   |
|   12000|   12.0 |  12.0  | 12.0   |

Why this is not a duplicate question

  1. Python regex engine is slightly different from other languages engines, therefore solutions given for other languages do not automatically apply
  2. Trailing zeros are to be replaced, not stripped
  3. This is about global replacement of many occurrencies in a multiline string, not just a single occurrency
mmj
  • 5,514
  • 2
  • 44
  • 51

5 Answers5

5

stribizhev's (previous but unsatisfactory) answer gave me the idea to get to a general solution:

re.sub(r'(?<=\.)(\d+?)(0+)(?=[^\d]|$)',lambda m:m.group(1)+' '*len(m.group(2))
mmj
  • 5,514
  • 2
  • 44
  • 51
3

You need to change the sub as follows:

print(re.sub(r'(?<=\.)([0-9]+?)(0+)(?=\D|$)',lambda m:m.group(1)+' '*len(m.group(2)), my_string))

See IDEONE demo

Here is a demo of what (?<=\.)([0-9]+?)(0+)(?=\D|$) regex matches.

The regex matches:

  • (?<=\.)([0-9]+?) - 1 or more digits but as few as possible if preceded with a literal . (a decimal separator)
  • (0+) - 1 or more zeros ...
  • (?=\D|$) - up to a non-digit \D or end of string $.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    After some tries your solution works as expected. I appreciate your efforts and thank you for the contribution, altough I can't help noticing that it is identical to mine already given solution. – mmj Sep 02 '15 at 15:33
2

Here is another approach:

my_list = [[12345, 12.345, 12.345, 12.345],
           [12340, 12.340, 12.340, 12.340],
           [12300, 12.300, 12.300, 12.300],
           [12000, 12.000, 12.000, 12.000]]

format_list = ["{:8d}", "{:8.2f}", "{:8.3f}", "{:8.4f}"]

for row in my_list:
    line = ["{:<8}".format(re.sub(r'(\.\d+?)0+', r'\1', y.format(x))) for x,y in zip(row, format_list)]
    print("|{}|".format("|".join(line)))

Giving the output:

|   12345|   12.35|  12.345| 12.345 |
|   12340|   12.34|  12.34 | 12.34  |
|   12300|   12.3 |  12.3  | 12.3   |
|   12000|   12.0 |  12.0  | 12.0   |
Martin Evans
  • 45,791
  • 17
  • 81
  • 97
  • Thanks, the result is like expected, but it doesn't comply to the global replacement requirement, therefore the solution won't be eligible for acceptance. – mmj Sep 02 '15 at 10:03
0

Can you try using this and seeing if it'll work? ([0-9]+(\.[0-9]+[1-9])?)(\.?0+$)

Joseph Song
  • 184
  • 1
  • 12
0

I'd suggest using string format instead of regex:

int_fmt = '{:>8d}'
general_fmt = '{:>8.5g}'
float_fmt = '{:>8.1f}'
for l in my_list:
    print '|'.join([int_fmt.format(l[0])] + [(float_fmt if int(x)==x else general_fmt).format(x) for x in l[1:]])

output:

   12345|  12.345|  12.345|  12.345
   12340|   12.34|   12.34|   12.34
   12300|    12.3|    12.3|    12.3
   12000|    12.0|    12.0|    12.0
yurib
  • 8,043
  • 3
  • 30
  • 55
  • Looks nice, it shouldn't be downvoted. It just suffers from a few little issues in my opinion: first, formatting definitions are too much mixed with code (this can be improved I believe), second but not least, if I want a column to be formatted with decimal point I must take care to pass floats, since, if I pass an integer, it will be formatted without decimal separator. – mmj Sep 02 '15 at 09:26
  • @mmj the int/float issue is exactly what the OP asked for. and of course it can be refactored to be more flexible, this is just a POC – yurib Sep 02 '15 at 09:27
  • The question was updated to reflect that final formatting should not rely on value being integer or float (see the behaviour on the very last value). Nonetheless I think this is an interesting solution if you don't need that feature. – mmj Sep 02 '15 at 09:40
  • @mmj so why does 12000 remain 12000 but 12 turn into 12.0 ? – yurib Sep 02 '15 at 09:45
  • Because it is specified in the format string: `'|{:8d}|{:8.2f}|{:8.3f}|{:8.4f}|'`. – mmj Sep 02 '15 at 09:56
  • @mmj so just the first value in each row is always an integer? – yurib Sep 02 '15 at 10:04
  • @mmj updated to reflect the fact that the first column is an integer, this solution has fixed precision for each other column but it's easy enough to use different precision for each column, this is just an example that shows it can be done. the main point is that IMO using string formats is cleaner than a regex. – yurib Sep 02 '15 at 10:17
  • I appreciate your contribution, thanks, even if I still prefer the regex solution because in IMO it allows the code to be more modular. – mmj Sep 02 '15 at 14:18