PyUnicode_FromFormat with (not-unicode) strings

Question

I try to create a representation function for a class and I want it to be python-2.x and python-3.x compatible. However I noticed that normal strings when passed to PyUnicode_FromFormat as %U will segfault. The only viable workaround that I found was to convert it to a unicode object myself with PyUnicode_FromObject and then pass the result to the PyUnicode_FromFormat:

/* key and value are arguments for the function. */
PyObject *repr;
if (PyUnicode_CheckExact(key)) {
    repr = PyUnicode_FromFormat("%U=%R", key, value);
} 
else {
    PyObject *tmp = PyUnicode_FromObject(key);
    if (tmp == NULL) {
        return NULL;
    }
    repr = PyUnicode_FromFormat("%U=%R", tmp, value);
    Py_DECREF(tmp);
}

The point is that I want the representation to be without the "" (or '') that would be added if I use %R or %S.

I only recently found the issue and I'm using PyUnicode_FromFormat("%U", something); all over the place so the question I have is: Can this be simplified while keeping it Python 2.x and 3.x compatible?

I don't have a good answer for this (and I don't think one exists) but I'd be tempted to simplify it to remove the if statements and just follow the `else` path every time. Calling `PyUnicode_FromObject` on something that's already a unicode object just does an `incref` and returns the object right back so it doesn't cost you a lot to always do it. — DavidW, Mar 12 '17 at 21:51
@DavidW Not exactly as simple as I would like but it definetly makes it shorter and simpler. Would you mind adding it as an answer? I wouldn't accept it right away (still hoping for an even simpler way) but it's definetly helpful. — MSeifert, Mar 12 '17 at 23:36
Is there a problem with this code? If not (i.e. it works as expected), this question might belong on Code Review, not Stack Overflow. — Nic, Mar 13 '17 at 02:20
@QPaysTaxes CodeReview wants "real and working code" not artificial code. I'm actually asking about this one code-snippet and I don't want an overall review of the code; so this would be [off-topic on code-review](http://codereview.stackexchange.com/help/on-topic). — MSeifert, Mar 13 '17 at 02:28
(1) This code is _real_, because it's what would actually be used, and isn't pseudo- or truncated code. That you're intending to replace other code with it is irrelevant. (2) I don't see how it's broken; if there's a problem, please emphasize it. (Also, I've got 4k rep there; I'd like to think I know what's on-topic by now :) ) — Nic, Mar 13 '17 at 02:31
@QPaysTaxes Well, I only disagree because the [actual code](https://github.com/MSeifert04/iteration_utilities/blob/master/src/partial.c#L587) which triggered this question is different (and not ready for any kind of code review)! :-) — MSeifert, Mar 13 '17 at 02:39
@MSeifert I mean, I can see where you're coming from, but there are plenty of on-topic -- and very good -- questions about... well, not _snippets_, but individual functions (which this is -- unless you want to copy/paste this everywhere you were using the one-liner). Just wrap this in an informative name and give some context and you're good to go. People will restrict their reviews to the code in the question itself. — Nic, Mar 13 '17 at 02:58

score 1 · Accepted Answer · answered Mar 13 '17 at 09:39

I don't think a very simplified way of doing what you want exists. The best I can see is to eliminate the if statement by just using your else case and thus always calling PyUnicode_FromObject:

PyObject *tmp = PyUnicode_FromObject(key);
if (tmp == NULL) {
    return NULL;
}
repr = PyUnicode_FromFormat("%U=%R", tmp, value);
Py_DECREF(tmp);

If you look at the implementation of PyUnicode_FromObject you'll see the first thing it does is PyUnicode_CheckExact and in that case it returns an increfed version of the original object. Therefore the extra work done is pretty small (for the case where key is already unicode) and it should be slightly more efficient for the case where key isn't unicode since you avoid a branch.

PyUnicode_FromFormat with (not-unicode) strings

1 Answers1