2

In Python, I'd like a good way to convert its complex number string output into an equivalent string representation which, when interpreted by Python, gives the same value.

Basically I'd like function complexStr2str(s: str): str that has the property that eval(complexStr2str(str(c))) is indistinguishable from c, for any c whose value is of type complex. However complexStr2str() only has to deal with the kinds of string patterns that str() or repr() output for complex values. Note that for complex values str() and repr() do the same thing.

By "indistinguishable" I don't mean == in the Python sense; you can define (or redefine) that to mean anything you want; "indistinguishable" means that if you have string a in a program which represents some value, and replace that in the program with string b (which could be exactly a), then there is no way to tell the difference between the running of the Python program and the replacement program, short of introspection of the program .

Note that (-0-0j) is not the same thing as -0j although the former is what Python will output for str(-0j) or repr(-0j). As shown in the interactive session below, -0j has real and imaginary float parts -0.0 while -0-0j has real and imaginary float parts positive 0.0.

The problem is made even more difficult in the presence of values like nan and inf. Although in Python 3.5+ ish you can import these values from math, for various reasons, I'd like to avoid having to do that. However using float("nan") is okay.

Consider this Python session:

>>> -0j
(-0-0j)
>>> -0j.imag
-0.0
>>> -0j.real
-0.0
>>> (-0-0j).imag
0.0  # this is not -0.0
>>> (-0-0j).real
0.0  # this is also not -0.0
>>> eval("-0-0j")
0j # and so this is -0j
>>> atan2(-0.0, -1.0)
-3.141592653589793
>>> atan2((-0-0j).imag, -1.0)
3.141592653589793
>>> -1e500j
(-0-infj)
>>> (-0-infj)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'infj' is not defined

Addendum:

This question has generated something of a stir (e.g. there are a number of downvotes for this question and its accepted solution). And there have been a lot of edits to the question, so some of the comments might be out of date.

The main thrust of the criticism is that one shouldn't want to do this. Parsing data from text from some existing program is a thing that happens all the time, and sometimes you just can't control the program that generated the data.

A related problem where one can control the outputter program but one needs to have it appear in text, is to write a better repr() function that works better for floats and complex numbers and follows the principle described at the end. It is straightforward to do that, even if it is a little ugly because to do it fully you also need to handle float/complex in composite types like lists, tuples, sets, and dictionaries.

Finally, I'll say that it appears that Python's str() or repr() output for complex values is unhelpful, which is why this problem is more specific to Python than other languages that support complex numbers as a primitive datatype or via a library.

Here is a session that shows this:

>>> complex(-0.0, -0.0)
(-0-0j)  # confusing and can lead to problems if eval'd
>>> repr(complex(-0.0, -0.0))
'(-0-0j)' # 'complex(-0.0, -0.0)' would be the simplest, clearest, and most useful

Note that str() gets called when doing output such as via print(). repr() is the preferred method for this kind of use but here it is the same as str() and both have problems with things like inf and nan.

For any built-in type (eval(repr(c)) should be indistinguisable from c.

rocky
  • 7,226
  • 3
  • 33
  • 74
  • have you tried `-0.` so it's a float literal instead of an int? there isn't a `-0` integer – anthony sottile Dec 18 '19 at 19:37
  • 1
    @AnthonySottile, BTW, `(-0. - 0.j)` is printed as `(-0+0j)` (with a plus sign!), which is even weirder... – ForceBru Dec 18 '19 at 19:38
  • @AnthonySottile I am not sure how that can be used in a solution to solve the underlying problem. of writing a function `complexStr2str(s: str): str` that has the property that `eval(complexStr2str(str(c)) == c`. @ForceBru doesn't this suggest even more brokenness of Python's `complex.__str__()`? – rocky Dec 18 '19 at 21:09
  • See [Bogus parsing/eval of complex literals](https://stackoverflow.com/q/36603561/674039) – wim Dec 18 '19 at 21:25
  • *"For any primitive datatype, (eval(str(c)) == c should be true"* Not so; strings are primitive in Python, and `str(c) == c` for any string `c`. You are thinking of the `repr` function. – kaya3 Dec 18 '19 at 22:39
  • Another counterexample is `float('nan')`, which is a `float`, but `str` of it is the string `'nan'`, which when `eval`'d will give a NameError unless you've defined it as a name. – kaya3 Dec 18 '19 at 22:40
  • That said, `(-0-0j) == -0j` is `True`, so your example *isn't* a counterexample. – kaya3 Dec 18 '19 at 22:42
  • @kaya I've changed the `str(c)` to `repr(c)`. Thanks for the clarification. As for the equality of `(-0-0j) == -0j`, it is `True` by the `__equal__()` function but not really equal in terms of what happens when use the `imag()` funciton on the undrelying complex number. I really mean equal in that sense, sort of like "deepequal", rather than "==". – rocky Dec 18 '19 at 23:08
  • There are so many numbers there... can you clearly point out some complex number `c` where `eval(repr(c)) == c` or `eval(str(c)) == c` **aren't** true? – Stefan Pochmann Dec 18 '19 at 23:11
  • @StefanPochmann : `eval(repr(complex(1e1000, 0))) == complex(1e1000, 0)`. But also see the recent edit where I clarify that `==` isn't the same thing as being indistinguishable. – rocky Dec 18 '19 at 23:18
  • Do you have a source for these repeated claims that `eval(repr(c)) == c` should hold for any built-in type? That's certainly not a guarantee that the language makes, the docs only say *"for many types, this function makes an attempt to [do that]"*. i.e. it's a convention - it's a "nice to have" when it's convenient to do so, but it's not an invariant that must be strictly enforced. – wim Dec 20 '19 at 07:35
  • @wim As I mentioned to to @StephanPochmann, see the correction. I don't mean `==`, I mean indistiguishable in the sense that is currently described. Writiing a `repr()` and `str()` that meets this property isn't all that hard. I am sure one can find many references that say that a programming language should try to be helpful and not lead to confusion and cause for programmer error. It is true however that some programmming languages opt instead simplifying in the normal case at the expense of causing subtle bugs when not "normal". I fail to see how `str()`'s output for complex simplifies. – rocky Dec 20 '19 at 18:19

3 Answers3

7

This question is based on false premise. To correctly preserve signed zeros, nan, and infinity when using complex numbers, you should use the function call rather than binops:

complex(real, imag)

It should be called with two floats:

>>> complex(-0., -0.)  # correct usage
(-0-0j)
>>> complex(-0, -0j)  # incorrect usage
-0j

Your problem with attempting to use eval the literals is that -0-0j is not actually a complex literal. It is a binary op, subtraction of an integer 0 with a complex 0j. The integer first had a unary sub applied, but that was a no-op for the integer zero.

The parser will reveal this:

>>> ast.dump(ast.parse("-0-0j"))
'Module(body=[Expr(value=BinOp(left=UnaryOp(op=USub(), operand=Constant(value=0, kind=None)), op=Sub(), right=Constant(value=0j, kind=None)))], type_ignores=[])'

Python's choices here will make more sense if you understand how the tokenizer works, it does not want to backtrack:

$ echo "-0-0j" > wtf.py
$ python -m tokenize wtf.py
0,0-0,0:            ENCODING       'utf-8'        
1,0-1,1:            OP             '-'            
1,1-1,2:            NUMBER         '0'            
1,2-1,3:            OP             '-'            
1,3-1,5:            NUMBER         '0j'           
1,5-1,6:            NEWLINE        '\n'           
2,0-2,0:            ENDMARKER      ''

But you can reason it yourself easily too, from the datamodel hooks and operator precedence:

>>> -0-0j  # this result seems weird at first
0j
>>> -(0) - (0j)  # but it's parsed like this
0j
>>> (0) - (0j)  # unary op (0).__neg__() applies first, does nothing
0j
>>> (0).__sub__(0j)  # left-hand side asked to handle first, but opts out
NotImplemented
>>> (0j).__rsub__(0)  # right-hand side gets second shot, reflected op works
0j

The same reasoning applies to -0j, it's actually a negation, and the real part is implicitly negated too:

>>> -0j  # where did the negative zero real part come from?
(-0-0j)
>>> -(0j)  # actually parsed like this
(-0-0j)
>>> (0j).__neg__()  # so *both* real and imag parts are negated
(-0-0j)

Let's talk about this part, it's pointing the blame in the wrong direction:

Python's str() representation for complex numbers with negative real and imaginary parts is unhelpful

No, there is nothing incorrect about the implementation of __str__ here, and your use of complex(-0,-0j) makes me suspect you didn't fully understand what's going on in the first place. Firstly, there is never reason to write -0 because there is no signed zero for integers, only floats. And that imaginary part -0j is still parsed as a USub on a complex as I've explained above. Usually you wouldn't pass an imaginary number itself as the imaginary part here, the right way to call complex is just with two floats: complex(-0., -0.). No surprises here.

Whilst I'll agree that the parsing/eval of complex expressions is counter-intuitive, I have disagree that there is anything amiss in their string representation. The suggestion to "improve" on the eval of expressions may be possible, with the goal of making eval(repr(c)) round-trip exactly - but it will mean that you can not use Python's left-to-right munching parser any more. That parser is fast, simple, and easy to explain. It is not a fair trade-off to greatly complicate the parse trees for the purpose of making expressions involving complex zeros behave less strangely, when nobody who needs to care about such details should be choosing repr(c) as their serialization format in the first place.

Note that ast.literal_eval only allows it as a convenience. ast.literal_eval("0+0j") will work despite not being a literal, and the other way around will fail:

>>> ast.literal_eval("0+0j")
0j
>>> ast.literal_eval("0j+0")
ValueError: malformed node or string: <_ast.BinOp object at 0xcafeb4be>

In conclusion, the string representation of complex numbers is fine. It's the way that you create the numbers that matters. str(c) is intended for human readable output, use a machine-friendly serialization format if you care about preserving signed zeros, nan, and infinities.

wim
  • 338,267
  • 99
  • 616
  • 750
  • Thanks for the explanation. However it doesn't answer the question, which is a real problem I have: I start out with input that was produced by a program that called `str(c)` for some complex `c`. Except for those people who know about the complexities of complex and want to take pains to avoid ambiguity, in effect that's what happens most of the time. Now I want to read in that string value and produce another string value such that when *it* is evaluated, will evalute to same complex number that was `str()`'d in the first place. I can't control the programs that `str(c)`'. – rocky Dec 18 '19 at 22:01
  • 1
    `str(c)` is a terrible way to serialize complex numbers. Information was already lost, so what you have asked for is impossible. It's unclear what sort of "answer" you're actually hoping for, given that the problem lies in the other program and must be resolved there? – wim Dec 18 '19 at 22:16
  • Ok. Then this suggests brokennness in Python. There are innumerable programs that call some print function or other without bothering to look at the underlying types and take special pains to deal with that. For any primitive datatype, `(eval(str(c)) == c` should hold true. And I suspect that's true for most programming languages that support a complex type. – rocky Dec 18 '19 at 22:24
  • @rocky you mean `eval(repr(c)) == c`, although that's still broken here. But there are plenty of types with weird `repr`s anyway; you should use some kind of real serialization. – o11c Dec 18 '19 at 22:43
  • @o11c Yes, you are correct I meant `repr` and have corrected that. As for the serialization, yes in an ideal world people wouldn't use `print` ever and would instead serialize data instead. As I said, I can't always control the outputter progam (Here though I can and will at a bit of programming effort.) Or hey, imagine a world and programming language where `str(c)` and `repr(c)` produced string representations that could be eval'd! Too often in the Python community, often when I ask for something difficult, instead of a solution I get a proscription for how I should work instead. – rocky Dec 18 '19 at 23:26
  • @rocky re: "For any primitive datatype" Python has no primitives. Everything is an object, so the assumption doesn't make sense. If the intent is to transfer objects from one program to another via serialization, the proper method is to use `pickle.dumps`/`pickle.loads`, not `str` – ParkerD Dec 18 '19 at 23:29
  • @ParkerD "Everything is an object, so the assumption doesn't make sense." If you prefer "built-in type" https://docs.python.org/3/library/stdtypes.html then okay. Python ships with `complex` and implements the functions for `__repr__` and `__str__`. It could do it in a more helpful way which won't lead to problems. It isn't an intractible problem to write a `repr()` for complex that always eval's properly. In fact, that's what this question is really about. – rocky Dec 18 '19 at 23:34
  • 1
    _"For any primitive datatype_ `(eval(repr(c))` _should be indistinguishable from_ `c` _."_, and _Too often in the Python community, often when I ask for something difficult, instead of a solution I get a proscription for how I should work instead._ I find these two statements together ironic given that one does exactly what the other laments. – keithpjolley Dec 18 '19 at 23:47
  • @rocky Okay the `str`/`repr` difference flew over my head initially. Seeing as how `repr(Decimal("0.1"))` returns `"Decimal('0.1')"`, `complex` objects should probably return a string that uses the `complex()` constructor method so they can be properly formed again. Sounds like something for the python devs to consider changing – ParkerD Dec 18 '19 at 23:48
  • 1
    @rocky It is not that you are asking for something *difficult*, it is that you are asking for something hacky and lame. If you really want to parse printed-for-human complex instances as the inverse of how Python has formatted it, that won't be difficult - just [look at the CPython representation of complex](https://github.com/python/cpython/blob/673c39331f844a80c465efd7cff88ac55c432bfb/Objects/complexobject.c#L353-L407) and undo what it did. My point is that such an approach is bad, since serializing to a machine-friendly format instead of a human-readable format is the obvious choice here. – wim Dec 18 '19 at 23:50
  • @keithpjolley I see no contradiction. I would prefer the hard work (and here I don't think it will be too hard) done by the programming language, not all programmers. When it is done inside the programming language, more people benefit than when each individual solves the problem in her own way. This is partly what Larry Wall was referring to by by the virtues "laziness, impatience, and huberous". – rocky Dec 18 '19 at 23:53
  • @wim I agree that it is hacky or crippled, but that's because Python has delt us a bad hand here by not implementing `__str()__` in a more helpful way. Given the current situation, the question becomes what's the best way to _cope_ with it. When one can avoid the problem, do it. In reality, there are lots of programs that dump data, say to a CSV using `print` or some other `str`-oriented way. Lots of other programs read, for example, CSV's dumped this way whether or not it has properly converted complex values. The question then is: can the "wrong" approach be made better? I think yes. – rocky Dec 19 '19 at 03:06
  • @rocky Pointing the finger at `__str__` makes me think you're still not really understanding what's going on here. It's all about the initial parsing, it's not about the implementation of the str/repr on complex. I've expanded again my answer to try and explain that, by the time you've called `str(c)`, the signs of `c` are already all determined and `str` **is** representing them correctly. If you grok how the expressions are parsed, there are no surprises left to see in your example Python session, and you'll agree it could not be any other way without a rewrite of the parser grammar! – wim Dec 19 '19 at 18:32
  • If you are saying that `eval(str(c))` is indistinguishable from `c` when `nan` and `inf` are defined, then okay. The problem then is handling just the `nan` and `inf` and possible negative 0.0 cases. But these are really a problem inherited from `str` and `repr` of a float value. And it may mean that repr and str should just be beefed up which is a different problem, but still a problem. Let me think about this. If the problem changes, I'll probably delete this question. – rocky Dec 19 '19 at 21:16
  • @rocky I'm not saying that `eval(str(c))` is indistinguishable from complex `c` (and that's plainly false, even without nan and inf thrown in the mix). I'm saying that the `str(c)` was never intended to round-trip correctly, and to make it round trip correctly would necessitate changes in the parser, not changes in `complex.__str__`. – wim Dec 19 '19 at 21:21
  • Ok. Then the problem still is a problem. If you want to replace repr() for str() for the language purists, then that's okay too. In the case of complex types, I think str and repr do exactly the same thing. I had also thought that maybe the think to do is to change eval. – rocky Dec 20 '19 at 04:11
  • @rocky Yes, for complex type, repr and str are the same, you can use them interchangeably. Changing eval is effectively the same as changing the interpreter, because all it does is interpret input expression as source code. Your requirement for this function which is acting like "*`eval(complexStr2str(str(c)) == c`, for any c of type complex*" is not satisfiable, because nan is valid value of the real and imag parts in complex. Even `nan != nan`, by definition, so it will never work for the general case with complex either (a complex is stored just like a pair of floats, essentially). – wim Dec 20 '19 at 06:43
  • @wim As I wrote somewhere else, I don't and didn't mean `==` in the Python sense but "indistinguishable" which means that if you have string `a` in a program and string `b` (which could be exactly `a`), then there is no way in the running of the Python program using one would change the running behavior, short of introspection of the program . I corrected that one place where it first came up but forgot to fix that in the problem description so I have just done that. At some other point I'll probably do a larger explanation and correction of the problem. – rocky Dec 20 '19 at 10:54
2

Because the eval(repr(c)) method doesn't work for complex types, using pickle is the most reliable way to serialize the data:

import pickle


numbers = [
    complex(0.0, 0.0),
    complex(-0.0, 0.0),
    complex(0.0, -0.0),
    complex(-0.0, -0.0),
]
serialized = [pickle.dumps(n) for n in numbers]

for n, s in zip(numbers, serialized):
    print(n, pickle.loads(s))

Output:

0j 0j
(-0+0j) (-0+0j)
-0j -0j
(-0-0j) (-0-0j)
ParkerD
  • 1,214
  • 11
  • 18
1

As @wim has noted in the comments, this is probably not the right solution to the real problem; it would be better to not have converted those complex numbers to strings via str in the first place. It's also quite unusual to care about the difference between positive and negative zero. But I can imagine rare situations where you do care about that difference, and getting access to the complex numbers before they get str()'d isn't an option; so here's a direct answer.

We can match the parts with a regex; [+-]?(?:(?:[0-9.]|[eE][+-]?)+|nan|inf) is a bit loose for matching floating point numbers, but it will do. We need to use str(float(...)) on the matched parts to make sure they are safe as floating point strings; so e.g. '-0' gets mapped to '-0.0'. We also need special cases for infinity and NaN, so they are mapped to the executable Python code "float('...')" which will produce the right values.

import re

FLOAT_REGEX = r'[+-]?(?:(?:[0-9.]|[eE][+-]?)+|nan|inf)'
COMPLEX_PATTERN = re.compile(r'^\(?(' + FLOAT_REGEX + r'\b)?(?:(' + FLOAT_REGEX + r')j)?\)?$')

def complexStr2str(s):
    m = COMPLEX_PATTERN.match(s)
    if not m:
        raise ValueError('Invalid complex literal: ' + s)

    def safe_float(t):
        t = str(float(0 if t is None else t))
        if t in ('inf', '-inf', 'nan'):
            t = "float('" + t + "')"
        return t

    real, imag = m.group(1), m.group(2)
    return 'complex({0}, {1})'.format(safe_float(real), safe_float(imag))

Example:

>>> complexStr2str(str(complex(0.0, 0.0)))
'complex(0.0, 0.0)'
>>> complexStr2str(str(complex(-0.0, 0.0)))
'complex(-0.0, 0.0)'
>>> complexStr2str(str(complex(0.0, -0.0)))
'complex(0.0, -0.0)'
>>> complexStr2str(str(complex(-0.0, -0.0)))
'complex(-0.0, -0.0)'
>>> complexStr2str(str(complex(float('inf'), float('-inf'))))
"complex(float('inf'), float('-inf'))"
>>> complexStr2str(str(complex(float('nan'), float('nan'))))
"complex(float('nan'), float('nan'))"
>>> complexStr2str(str(complex(1e100, 1e-200)))
'complex(1e+100, 1e-200)'
>>> complexStr2str(str(complex(1e-100, 1e200)))
'complex(1e-100, 1e+200)'

Examples for string inputs:

>>> complexStr2str('100')
'complex(100.0, 0.0)'
>>> complexStr2str('100j')
'complex(0.0, 100.0)'
>>> complexStr2str('-0')
'complex(-0.0, 0.0)'
>>> complexStr2str('-0j')
'complex(0.0, -0.0)'
kaya3
  • 47,440
  • 4
  • 68
  • 97
  • Thanks! I'll try this for a bit and see how it works. Will probably accept this in a day or so. – rocky Dec 19 '19 at 03:16
  • Can this be expanded to handle "-0j" since there are some python programs that contain this. See for example https://github.com/python/cpython/blob/master/Lib/test/test_complex.py#L530 – rocky Dec 19 '19 at 03:56
  • 1
    I am getting ` complexStr2str("-0j") == complex(0.0, -0.0)` when I believe that should be `complex(-0.0, -0.0)`. I am now thinking that a better way to do this is to take the string eval twice with both `.imag` and `.real` extractions and then use that in the `complex()`. That would also handle `complexStr2str("-0.-0.j")` mentioned by @ForceBru. Right now this code gives a `ValueError: Invalid complex literal: -0.-0.j`. Currently the approach I am using is to write a better `repr()` that has the desirable property mentioned. However that is an answer to a different problem. – rocky Dec 19 '19 at 17:20
  • 1
    @rocky Hang on a minute, your comment is not consistent with the problem requirements stated in the question, which are that `eval(complexStr2str(str(c)))` and `c` should be the same. If `c = -0j` then `c.real` and `c.imag` are both `-0.0`, the result of `str(c)` is `'(-0-0j)'`, my function converts that to `'complex(-0.0, -0.0)'`, which when eval'd has the same real and imag parts as `c`, so my function's behaviour is correct. `"-0.-0.j"` is not the output of `str` for any complex number, so it is not supposed to be an acceptable input to this function. – kaya3 Dec 20 '19 at 07:12
  • 1
    I don't think '100' or '-0' can ever be the repr of a complex number. – wim Dec 20 '19 at 20:03
  • That seems to be true, @wim, so I guess the function can be simplified. But the question doesn't say that impossible inputs need to be rejected, so it's not really a problem. – kaya3 Dec 20 '19 at 20:04