26

I realise that if you have an iterable you should always use .join(iterable) instead of for x in y: str += x. But if there's only a fixed number of variables that aren't already in an iterable, is using .join() still the recommended way?

For example I have

user = 'username'
host = 'host'

should I do

ret = user + '@' + host

or

ret = '@'.join([user, host])

I'm not so much asking from a performance point of view, since both will be pretty trivial. But I've read people on here say always use .join() and I was wondering if there's any particular reason for that or if it's just generally a good idea to use .join().

Meredith
  • 3,928
  • 4
  • 33
  • 58
Falmarri
  • 47,727
  • 41
  • 151
  • 191

6 Answers6

32

If you're creating a string like that, you normally want to use string formatting:

>>> user = 'username'
>>> host = 'host'
>>> '%s@%s' % (user, host)
'username@host'

Python 2.6 added another form, which doesn't rely on operator overloading and has some extra features:

>>> '{0}@{1}'.format(user, host)
'username@host'

As a general guideline, most people will use + on strings only if they're adding two strings right there. For more parts or more complex strings, they either use string formatting, like above, or assemble elements in a list and join them together (especially if there's any form of looping involved.) The reason for using str.join() is that adding strings together means creating a new string (and potentially destroying the old ones) for each addition. Python can sometimes optimize this away, but str.join() quickly becomes clearer, more obvious and significantly faster.

Thomas Wouters
  • 130,178
  • 23
  • 148
  • 122
  • 11
    It's worth noting that the `%` notation is deprecated and that the `.format()` method is *The Way Of The Future*. Relevant documentation: http://docs.python.org/library/string.html#formatstrings – asthasr Nov 12 '10 at 16:22
  • 2
    `%`-formatting operations aren't deprecated yet. They're considered *obsolete*, but they're still available in all Python versions, haven't been scheduled for actual removal yet and don't trigger any kind of warning. – Thomas Wouters Nov 12 '10 at 16:28
  • 2
    As an aside, I find this more a little sad, because string formatting *as an operator* is one of the endearing Python quirks that initially drew me to the language. – kindall Nov 12 '10 at 16:28
  • 1
    Standard string formatting is universal, a mechanism that all programmers understand immediately and intuitively. Python's string formatting is Python-specific, used by nothing else; a whole lot of people don't understand it immediately and have to look up its documentation frequently. I strongly advise using standard, "traditional" string formatting unless there's a specific reason to use Python's formatting. There are reasons to use it, of course, but very often the costs don't come close to outweighing the benefits, especially for trivial, constant formatting strings like this. – Glenn Maynard Nov 12 '10 at 16:32
  • I understood Python's use of the `%` operator pretty much instantly just from looking at sample code, and the specifiers are similar to those used in C `printf`. The things I've had to look up about `%` I would also have had to look up if I were using `str.format()`. Plus the curly braces for placeholders just look weird (and nonstandard) to me. Like the Perl (and Ruby) `=~` for regexes, Python's `%` made me wonder why all languages don't do it the same way. The more Python does things like every other language, the less reason there is to use Python. I'll shed a tear when it's gone. – kindall Nov 12 '10 at 18:58
  • @syrion, I keep seeing people make the claim that .format() is preferred - is there a PEP or other official statement somewhere to that effect, possibly with a rationale? @Glenn, by "Python's string formatting is Python-specific" are you speaking of the % notation or the new .format()? – Russell Borogove Nov 12 '10 at 22:21
  • 2
    http://docs.python.org/library/stdtypes.html: "This method of string formatting is the new standard in Python 3.0, and should be preferred to the % formatting described in String Formatting Operations in new code." – asthasr Nov 13 '10 at 01:24
  • Also, since Python 2.7, it is also possible to do `'{}@{}'.format(user, host)`. Still prefer `+` in this case, though. – Jasmijn Jun 06 '11 at 20:20
  • ... and you totally answered some other question that you imagined, "How should i assemble email strings?". The question was, is there real justification for using `join` over small number of `+` in expressions? – Nas Banov May 10 '12 at 18:23
  • Future people: Python 3.6 adds the new format string literal, which would mean that you can use `f'{user}@{host}'` where `user` and `host` are the variable names. This may become the de-facto way to do this as the variables are in the string and I find it the most readable. – Artyer Dec 05 '16 at 23:05
14

I take the question to mean: "Is it ok to do this:"

ret = user + '@' + host

..and the answer is yes. That is perfectly fine.

You should, of course, be aware of the cool formatting stuff you can do in Python, and you should be aware that for long lists, "join" is the way to go, but for a simple situation like this, what you have is exactly right. It's simple and clear, and performance will not be an issue.

Nick Perkins
  • 8,034
  • 7
  • 40
  • 40
10

(I'm pretty sure all of the people pointing at string formatting are missing the question entirely.)

Creating a string by constructing an array and joining it is for performance reasons only. Unless you need that performance, or unless it happens to be the natural way to implement it anyway, there's no benefit to doing that rather than simple string concatenation.

Saying '@'.join([user, host]) is unintuitive. It makes me wonder: why is he doing this? Are there any subtleties to it; is there any case where there might be more than one '@'? The answer is no, of course, but it takes more time to come to that conclusion than if it was written in a natural way.

Don't contort your code merely to avoid string concatenation; there's nothing inherently wrong with it. Joining arrays is just an optimization.

Glenn Maynard
  • 55,829
  • 10
  • 121
  • 131
  • 2
    I'm not sure how I "missed the point" in my answer. Also, 'array' and 'list' are not the same thing. – Thomas Wouters Nov 12 '10 at 16:31
  • @Thomas: With a couple decades of habit calling arrays arrays, I don't always go to the effort of calling them by Python's less common name. I think your answer missed the point because his question was specifically comparing `[].join` to string concatenation and asking whether to avoid string concatenation even in simple cases; *not* asking for the ideal way to format that particular, contrived example. – Glenn Maynard Nov 12 '10 at 16:39
  • 3
    The question was about "a fixed number of variables", which is a common situation where string formatting -- with `%` or with `str.format` -- is used. My answer does explain why people go for `str.join()`. As for lists versus arrays, I don't think using the wrong name is a particularly good idea, considering Python *does* have arrays, and they're quite different things. (And for the same reason, to make sure readers don't get confused, I'll note that `[].join` doesn't exist.) – Thomas Wouters Nov 12 '10 at 16:43
  • @Thomas: The question is clear, both in the title: "python .join or string concatination" and in the text; he wasn't asking for a third alternative, he was asking whether there's some reason to prefer joining over concatenation. – Glenn Maynard Nov 12 '10 at 16:54
  • I will generally use concatenation for up to three and perhaps four items if it's the clearest way to write what I mean. Often it's a tossup between that and the string formatting operator. – Steven Rumbalski Nov 12 '10 at 17:05
8

I'll just note that I've always tended to use in-place concatenation until I was rereading a portion of the Python general style PEP PEP-8 Style Guide for Python Code.

  • Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Pyrex, Psyco, and such). For example, do not rely on CPython's efficient implementation of in-place string concatenation for statements in the form a+=b or a=a+b. Those statements run more slowly in Jython. In performance sensitive parts of the library, the ''.join() form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.

Going by this, I have been converting to the practice of using joins so that I may retain the habit as a more automatic practice when efficiency is extra critical.

So I'll put in my vote for:

ret = '@'.join([user, host])
Matthew
  • 737
  • 10
  • 15
1

I use next:

ret = '%s@%s' % (user, host)
anti_social
  • 323
  • 3
  • 9
1

I recommend join() over concatenation, based on two aspects:

  1. Faster.
  2. More elegant.

Regarding the first aspect, here's an example:

import timeit    

s1 = "Flowers"    
s2 = "of"    
s3 = "War"    

def join_concat():    
    return s1 + " " + s2 + " " + s3  

def join_builtin():    
    return " ".join((s1, s2, s3))    

print("Join Concatenation: ", timeit.timeit(join_concat))         
print("Join Builtin:       ", timeit.timeit(join_builtin))

The output:

$ python3 join_test.py
Join Concatenation:  0.40386943198973313
Join Builtin:        0.2666833929979475

Considering a huge dataset (millions of lines) and its processing, 130 milliseconds per line, it's too much.

And for the second aspect, indeed, is more elegant.

ivanleoncz
  • 9,070
  • 7
  • 57
  • 49