Most pythonic way to interleave two strings

Question

What's the most pythonic way to mesh two strings together?

For example:

Input:

u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'

Output:

'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

Answers here have largely assumed that your two input strings will be the same length. Is that a safe assumption or do you need that to be handled? — SuperBiasedMan, Jan 13 '16 at 09:37
@SuperBiasedMan It may be helpful to see how to handle all conditions if you have a solution. It's relevant to the question, but not my case specifically. — Brandon Deo, Jan 13 '16 at 15:15
@drexx The top answerer commented with a solution for it anyway, so I just edited it into their post so it's comprehensive. — SuperBiasedMan, Jan 13 '16 at 15:17

Dimitris Fasarakis Hilliard · Accepted Answer · 2016-01-15T08:29:45.487

129

For me, the most pythonic* way is the following which pretty much does the same thing but uses the + operator for concatenating the individual characters in each string:

res = "".join(i + j for i, j in zip(u, l))
print(res)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

It is also faster than using two join() calls:

In [5]: l1 = 'A' * 1000000; l2 = 'a' * 1000000

In [6]: %timeit "".join("".join(item) for item in zip(l1, l2))
1 loops, best of 3: 442 ms per loop

In [7]: %timeit "".join(i + j for i, j in zip(l1, l2))
1 loops, best of 3: 360 ms per loop

Faster approaches exist, but they often obfuscate the code.

Note: If the two input strings are not the same length then the longer one will be truncated as zip stops iterating at the end of the shorter string. In this case instead of zip one should use zip_longest (izip_longest in Python 2) from the itertools module to ensure that both strings are fully exhausted.

_{*To take a quote from the Zen of Python: Readability counts.

Pythonic = readability for me; i + j is just visually parsed more easily, at least for my eyes.}

edited Jan 15 '16 at 08:29

answered Jan 13 '16 at 00:13

Dimitris Fasarakis Hilliard

150,925
31
268
253

1

Coding effort for n strings is O(n), though. Still, it's good as long as n is small. – TigerhawkT3 Jan 13 '16 at 00:27
Your generator is probably causing more overhead than the join. – Padraic Cunningham Jan 13 '16 at 00:32
5

run `"".join([i + j for i, j in zip(l1, l2)])` and it will definitely be the fastest – Padraic Cunningham Jan 13 '16 at 00:39
6

`"".join(map("".join, zip(l1, l2)))` is even faster, although not necessarily more pythonic. – Aleksi Torhamo Jan 13 '16 at 15:31

Mike Müller · Answer 2 · 2016-01-19T20:59:08.960

Faster Alternative

Another way:

res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
print(''.join(res))

Output:

'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

Speed

Looks like it is faster:

%%timeit
res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
''.join(res)

100000 loops, best of 3: 4.75 µs per loop

than the fastest solution so far:

%timeit "".join(list(chain.from_iterable(zip(u, l))))

100000 loops, best of 3: 6.52 µs per loop

Also for the larger strings:

l1 = 'A' * 1000000; l2 = 'a' * 1000000

%timeit "".join(list(chain.from_iterable(zip(l1, l2))))
1 loops, best of 3: 151 ms per loop


%%timeit
res = [''] * len(l1) * 2
res[::2] = l1
res[1::2] = l2
''.join(res)

10 loops, best of 3: 92 ms per loop

Python 3.5.1.

Variation for strings with different lengths

u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijkl'

Shorter one determines length (`zip()` equivalent)

min_len = min(len(u), len(l))
res = [''] * min_len * 2 
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
print(''.join(res))

Output:

AaBbCcDdEeFfGgHhIiJjKkLl

Longer one determines length (`itertools.zip_longest(fillvalue='')` equivalent)

min_len = min(len(u), len(l))
res = [''] * min_len * 2 
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
res += u[min_len:] + l[min_len:]
print(''.join(res))

Output:

AaBbCcDdEeFfGgHhIiJjKkLlMNOPQRSTUVWXYZ

This builds the list `[''] * len(u)` and then throws it away. Better do `[''] * (len(u) * 2)`. — Kelly Bundy, May 12 '22 at 10:55
Makes the solution [~10% faster](https://tio.run/##lVC7bsMwDNz1FVwCSoEQRPZSOMjQ7zCEIG3kVoVekJWhX@@ISoIa7VQu4t2RR4rpu3zG0L@kvCyzKdcER0BE5hQlrwhbUPsWB3AdcecVx6iUvceLmas0NpjvOaKudc4E7pSoWUfCOAydrqpTDakn7Grn7ivawCstyEb@9eIrM/E/N6YZm3L0UKw3toD1KebyQIyeNueOOf1HQruGhHD1byYflWDwjClmOIENkM/hw3C1/6VRP8ntLpoRZX/KOzG08pRtKHzLcbNTE/gZEDbACx3X9CvHh2khh7lubS68LTxa@q8Yh14LCUjDUIIVy3ID) in my test. — Kelly Bundy, May 12 '22 at 11:07

score 49 · Answer 3 · answered Jan 13 '16 at 00:08

49

With join() and zip().

>>> ''.join(''.join(item) for item in zip(u,l))
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

answered Jan 13 '16 at 00:08

TigerhawkT3

48,464
6
60
97

17

Or `''.join(itertools.chain.from_iterable(zip(u, l)))` – Blender Jan 13 '16 at 00:10
1

This will truncate a list if one is shorter than the other, as `zip` stops when the shorter list has been fully iterated over. – SuperBiasedMan Jan 13 '16 at 09:35
5

@SuperBiasedMan - Yep. `itertools.zip_longest` can be used if it becomes an issue. – TigerhawkT3 Jan 13 '16 at 09:47

Veedrac · Answer 4 · 2016-01-13T08:25:20.893

On Python 2, by far the faster way to do things, at ~3x the speed of list slicing for small strings and ~30x for long ones, is

res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)

This wouldn't work on Python 3, though. You could implement something like

res = bytearray(len(u) * 2)
res[::2] = u.encode("ascii")
res[1::2] = l.encode("ascii")
res.decode("ascii")

but by then you've already lost the gains over list slicing for small strings (it's still 20x the speed for long strings) and this doesn't even work for non-ASCII characters yet.

FWIW, if you are doing this on massive strings and need every cycle, and for some reason have to use Python strings... here's how to do it:

res = bytearray(len(u) * 4 * 2)

u_utf32 = u.encode("utf_32_be")
res[0::8] = u_utf32[0::4]
res[1::8] = u_utf32[1::4]
res[2::8] = u_utf32[2::4]
res[3::8] = u_utf32[3::4]

l_utf32 = l.encode("utf_32_be")
res[4::8] = l_utf32[0::4]
res[5::8] = l_utf32[1::4]
res[6::8] = l_utf32[2::4]
res[7::8] = l_utf32[3::4]

res.decode("utf_32_be")

Special-casing the common case of smaller types will help too. FWIW, this is only 3x the speed of list slicing for long strings and a factor of 4 to 5 slower for small strings.

Either way I prefer the join solutions, but since timings were mentioned elsewhere I thought I might as well join in.

Padraic Cunningham · Answer 5 · 2016-01-13T23:08:17.763

16

If you want the fastest way, you can combine itertools with operator.add:

In [36]: from operator import add

In [37]: from itertools import  starmap, izip

In [38]: timeit "".join([i + j for i, j in uzip(l1, l2)])
1 loops, best of 3: 142 ms per loop

In [39]: timeit "".join(starmap(add, izip(l1,l2)))
1 loops, best of 3: 117 ms per loop

In [40]: timeit "".join(["".join(item) for item in zip(l1, l2)])
1 loops, best of 3: 196 ms per loop

In [41]:  "".join(starmap(add, izip(l1,l2))) ==  "".join([i + j   for i, j in izip(l1, l2)]) ==  "".join(["".join(item) for item in izip(l1, l2)])
Out[42]: True

But combining izip and chain.from_iterable is faster again

In [2]: from itertools import  chain, izip

In [3]: timeit "".join(chain.from_iterable(izip(l1, l2)))
10 loops, best of 3: 98.7 ms per loop

There is also a substantial difference between chain(* and chain.from_iterable(....

In [5]: timeit "".join(chain(*izip(l1, l2)))
1 loops, best of 3: 212 ms per loop

There is no such thing as a generator with join, passing one is always going to be slower as python will first build a list using the content because it does two passes over the data, one to figure out the size needed and one to actually do the join which would not be possible using a generator:

join.h:

 /* Here is the general case.  Do a pre-pass to figure out the total
  * amount of space we'll need (sz), and see whether all arguments are
  * bytes-like.
   */

Also if you have different length strings and you don't want to lose data you can use izip_longest :

In [22]: from itertools import izip_longest    
In [23]: a,b = "hlo","elworld"

In [24]:  "".join(chain.from_iterable(izip_longest(a, b,fillvalue="")))
Out[24]: 'helloworld'

For python 3 it is called zip_longest

But for python2, veedrac's suggestion is by far the fastest:

In [18]: %%timeit
res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)
   ....: 
100 loops, best of 3: 2.68 ms per loop

edited Jan 13 '16 at 23:08

answered Jan 13 '16 at 00:53

Padraic Cunningham

176,452
29
245
321

2

why `list`?? is unneeded – Copperfield Jan 13 '16 at 01:10
1

not according to my tests, you lose time making the intermediary list and that defeat the purpose of using iterators. Timeit the `"".join(list(...))` give me 6.715280318699769 and timeit the `"".join(starmap(...))` give me 6.46332361384313 – Copperfield Jan 13 '16 at 01:22
1

then what, is machine dependent?? because no matter where I run the test I get the same exact result `"".join(list(starmap(add, izip(l1,l2))))` is slower than `"".join(starmap(add, izip(l1,l2)))`. I run the test in my machine in python 2.7.11 and in python 3.5.1 even in the virtual console of [www.python.org](https://www.python.org/) with python 3.4.3 and all say the same and I run it a couple of times and always the same – Copperfield Jan 13 '16 at 13:55
I read and I what I see is that it build a list internally all the time in its buffers variable regarless of what you pass to it, so the more reason to NO give it a list – Copperfield Jan 13 '16 at 15:29
@Copperfield, are you talking about the list call or passing a list? – Padraic Cunningham Jan 13 '16 at 15:42
the list call, in `list(starmap(...))` vs `starmap(...)` or similar with any of the itertools functions. In passing a list vs passing general generator like `join([ a+b for...])` vs `join( a+b for ...)` my tests agree with yours – Copperfield Jan 13 '16 at 15:54
There's always `map(add, l1, l2)` for prettiness. It seems to be slower than `starmap` though. That said, I can't repro the list comprehension being slower than `starmap`. – Veedrac Jan 13 '16 at 22:42
@PadraicCunningham wrt. `list(...)` being slower, manually calling `list` *won't* make things faster. The only reason `"".join([x for x in y])` is recommended over `"".join(x for x in y)` is that the latter creates a generator, which has pause-resume overhead. Doing `"".join(list(x for x in y))` wouldn't help things. – Veedrac Jan 13 '16 at 22:46
@Veedrac, I thought they were talking about a list vs a generator, the list call is not needed but it adds about 1 percent overhead so it does not have much of a bearing in either case. The only thing that makes a significant difference is using a generator vs a list comprehension – Padraic Cunningham Jan 13 '16 at 23:07
you don't need starmap here: [`''.join(map(add, a, b))`](https://repl.it/@zed1/interleaved-strings) – jfs Mar 13 '18 at 20:33

root · Answer 6 · 2016-01-22T08:50:20.310

13

You could also do this using map and operator.add:

from operator import add

u = 'AAAAA'
l = 'aaaaa'

s = "".join(map(add, u, l))

Output:

'AaAaAaAaAa'

What map does is it takes every element from the first iterable u and the first elements from the second iterable l and applies the function supplied as the first argument add. Then join just joins them.

edited Jan 22 '16 at 08:50

answered Jan 14 '16 at 08:23

root

1,066
1
8
16

knite · Answer 7 · 2016-02-02T18:26:52.000

8

Jim's answer is great, but here's my favorite option, if you don't mind a couple of imports:

from functools import reduce
from operator import add

reduce(add, map(add, u, l))

edited Feb 02 '16 at 18:26

answered Feb 02 '16 at 07:52

knite

6,033
6
38
54

7

He said most Pythonic, not most Haskellic ;) – Curt Feb 05 '16 at 04:12

Christofer Ohlsson · Answer 8 · 2016-01-13T07:56:10.693

A lot of these suggestions assume the strings are of equal length. Maybe that covers all reasonable use cases, but at least to me it seems that you might want to accomodate strings of differing lengths too. Or am I the only one thinking the mesh should work a bit like this:

u = "foobar"
l = "baz"
mesh(u,l) = "fboaozbar"

One way to do this would be the following:

def mesh(a,b):
    minlen = min(len(a),len(b))
    return "".join(["".join(x+y for x,y in zip(a,b)),a[minlen:],b[minlen:]])

score 5 · Answer 9 · answered Jan 13 '16 at 01:33

5

I like using two fors, the variable names can give a hint/reminder to what is going on:

"".join(char for pair in zip(u,l) for char in pair)

answered Jan 13 '16 at 01:33

Neal Fultz

9,282
1
39
60

score 4 · Answer 10 · answered Jan 13 '16 at 05:42

4

Just to add another, more basic approach:

st = ""
for char in u:
    st = "{0}{1}{2}".format( st, char, l[ u.index( char ) ] )

answered Jan 13 '16 at 05:42

WeRelic

290
1
12

score 4 · Answer 11 · answered Feb 06 '16 at 19:24

Feels a bit un-pythonic not to consider the double-list-comprehension answer here, to handle n string with O(1) effort:

"".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)

where all_strings is a list of the strings you want to interleave. In your case, all_strings = [u, l]. A full use example would look like this:

import itertools
a = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
b = 'abcdefghijklmnopqrstuvwxyz'
all_strings = [a,b]
interleaved = "".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)
print(interleaved)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

Like many answers, fastest? Probably not, but simple and flexible. Also, without too much added complexity, this is slightly faster than the accepted answer (in general, string addition is a bit slow in python):

In [7]: l1 = 'A' * 1000000; l2 = 'a' * 1000000;

In [8]: %timeit "".join(a + b for i, j in zip(l1, l2))
1 loops, best of 3: 227 ms per loop

In [9]: %timeit "".join(c for cs in zip(*(l1, l2)) for c in cs)
1 loops, best of 3: 198 ms per loop

Still not as fast as the fastest answer, though: which got 50.3 ms on this same data and computer — scnerd, Feb 06 '16 at 19:32

score 3 · Answer 12 · answered Feb 02 '16 at 06:40

Potentially faster and shorter than the current leading solution:

from itertools import chain

u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'

res = "".join(chain(*zip(u, l)))

Strategy speed-wise is to do as much at the C-level as possible. Same zip_longest() fix for uneven strings and it would be coming out of the same module as chain() so can't ding me too many points there!

Other solutions I came up with along the way:

res = "".join(u[x] + l[x] for x in range(len(u)))

res = "".join(k + l[i] for i, k in enumerate(u))

MSeifert · Answer 13 · 2017-04-04T19:37:22.577

You could use iteration_utilities.roundrobin¹

u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'

from iteration_utilities import roundrobin
''.join(roundrobin(u, l))
# returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

or the ManyIterables class from the same package:

from iteration_utilities import ManyIterables
ManyIterables(u, l).roundrobin().as_string()
# returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

^{1 This is from a third-party library I have written: iteration_utilities.}

score 2 · Answer 14 · answered Feb 03 '16 at 15:03

2

I would use zip() to get a readable and easy way:

result = ''
for cha, chb in zip(u, l):
    result += '%s%s' % (cha, chb)

print result
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

answered Feb 03 '16 at 15:03

valeas

364
1
7
18

Most pythonic way to interleave two strings

14 Answers14

Faster Alternative

Speed

Variation for strings with different lengths

Shorter one determines length (`zip()` equivalent)

Longer one determines length (`itertools.zip_longest(fillvalue='')` equivalent)

Linked

Related

Most pythonic way to interleave two strings

14 Answers14

Faster Alternative

Speed

Variation for strings with different lengths

Shorter one determines length (zip() equivalent)

Longer one determines length (itertools.zip_longest(fillvalue='') equivalent)

Linked

Related

Shorter one determines length (`zip()` equivalent)

Longer one determines length (`itertools.zip_longest(fillvalue='')` equivalent)