when should I use for or map or generator

Question

I would like to know what is the difference and when should I use every loop Thanks this code just transform number to 'abcdefghij' chars example '000000000' -> 'aaaaaaaaa','9999999999'->'jjjjjjjjjj'

string ='123456789'
old_string = string.rjust(10,'0')

new_s =''.join(chr(97+int(i)) for i in old_string)
print(new_s)   
new_s =''.join(map(lambda x: chr(97+int(x)),old_string))
print(new_s)
new_s=''
for i in old_string:
    new_s+= chr(97+int(i))

I'd probably write your first version, but either of the first two versions are fine. I'd avoid the final one because repeated string concatenation is expensive. — Frank Yellin, Mar 27 '22 at 00:55
You never told us what criteria you would use to judge which version would be considered "better". — Mark Ransom, Mar 27 '22 at 06:25

Jérôme Richard · Answer 1 · 2022-03-27T03:00:36.737

1

Parsing a digit with int is a bit slow and not needed here. You can use ord instead. Then, you can just add ord('a')-ord('0') which is 49. The result is:

new_s = ''.join(chr(ord(i)+49) for i in old_string)

For small strings, this is faster to generate a list so join can be faster (because it know the size of the final string):

new_s = ''.join([chr(ord(i)+49) for i in old_string])

This is not a good idea for large strings since it requires more memory.

Note that map is generally faster than basic generator expression. However, when it is combined with a lambda/function, it is slower than a generator expression (certainly due to the slow function call). Unfortunately, there is no builtin function to add 49 to all integers of the list here so the above last expression is likely the fastest.

Note that for loops are generally slower than generator expressions and comprehension lists. Iterative string concatenation make them even slower at pointed out by @FrankYellin in the comments.

Update 1:

An even faster solution is to encode/decode the string so to avoid calling ord/chr for each character separately (note that the previous comment about comprehension list still applies):

new_s = bytes([i+49 for i in old_string.encode()]).decode()

Update 2:

The solution provided by @KellyBundy seems the fastest on my machine:

prepared_table = str.maketrans('0123456789', 'abcdefghij')
new_s = old_string.translate(prepared_table)

Benchmark

Here is the results on my machine with your small input string:

for loop:                           2.0 us
join + map + lambda + int:          2.0 us
join + generator-expr + int:        1.8 us
join + generator-expr + ord:        1.2 us
join + comprehension-list + ord:    1.0 us
encode/decode:                      0.8 us
translate:                          0.3 us  (str.maketrans excluded)

edited Mar 27 '22 at 03:00

answered Mar 27 '22 at 01:59

Jérôme Richard

41,678
6
29
59

How much "more memory"? – Kelly Bundy Mar 27 '22 at 02:01
Here, I would say the size of a python object + a reference per item. So typically `(32+8)*len(old_string)` bytes on a 64-bit standard CPython interpreter. This is significantly more than the input/output that are expected to take 1 byte/char since they are ASCII characters stored in a unicode string. – Jérôme Richard Mar 27 '22 at 02:04
For fastest, I guess `new_s = old_string.translate(prepared_table)`. – Kelly Bundy Mar 27 '22 at 02:04
But you're comparing it to the generator one, right? The `join` builds a list from the generator anyway. – Kelly Bundy Mar 27 '22 at 02:07
1

Apparently the list comp and the list built by `join` overallocate at different sizes, so their memory usages are always within [+/- 6%](https://tio.run/##bZDBbsMgDIbveQrfIFOUttphU6Q@SRRVjJA0UzEI6CGK@uyZTaKqm@YT/vn8@wc/p6vD908f1nWy3oUEKShtrLrdnAYVIdmi6M0Ao0GpryrEsimASoj6202kweACaJgQtusN1876f/n270BHE6wgK0HhaOSxgtMx137YPTIPZxBawBtg1qyxLLVd7thouGN2p8RVzrENcyVbx6RCkuVTYnoP@tTIkywJHk265P/oL6S5MMuyPXWvXKyV9wZ7Sefy9x7n9zUjxSA/pnPvw4RJDmLBc/NRPZgw2MAyNtR8zclEfisHh0W/ageGDvohynX9AQ) and which one takes more space depends on the string length. – Kelly Bundy Mar 27 '22 at 02:20
1

I'd write it as `prepared_table = str.maketrans('0123456789', 'abcdefghij')`. Or maybe compute it with dict comp :-) (Nah I think that's less clear) – Kelly Bundy Mar 27 '22 at 02:23
Your solution is indeed very fast. Thank you. I did not know `translate` nor `maketrans` :) . For `join`, AFAIK, CPython build a growing string directly without using a slow list. The growing string use an exponential growth strategy so to be O(n) and relatively fast. When the site of the iterable is known, CPython can directly pre-allocate the string to the right size and just fill it. – Jérôme Richard Mar 27 '22 at 02:24
1

When given a (generator) iterator, it builds a list of the strings to join. See the [HOWEVER](https://stackoverflow.com/a/9061024/12671057) part of Raymond Hettinger's answer. – Kelly Bundy Mar 27 '22 at 02:28
This is true indeed. I am very surprised CPython does that in two passes. This seems a big missed optimization to me... Besides this, I also found out that CPython make a cache of all character so not to allocate object of character. This means that for large strings only 8 bytes per character are needed hopefully (I confirmed this on my machine with a test). – Jérôme Richard Mar 27 '22 at 02:52
1

[Benchmark with a longer string](https://tio.run/##dZPRjpswEEXf/RXzUmHvIhoCZLOV8iVthUwYJ24BI3vysF@fjoFVd9vYkoXto3tnzIznN7q6qTrO/n433o1gCT05NwSw4@w8gccZNYkFuhm9Juffme57sS3JjmhJiB4NmFKqbwJ4eKSbn6B7Iwzyu32uX8FEOdgJ3NC3gbydLgVOZ9ejVD9V0eO63Jz2D51GPUu2KtqWM2jb/JGX@s@rSnqxy0OPfLs9B/vsN/O59ti3pLsB4QQsLEb9G8nrKchsV@6rujm8HF@zHDLdnTmBy9X@ytSaSv1PKh9iLw6DJpSfg7zfoklK4cdyvo2C1YM@Iyez5JCpFC8j79J8H/k5zavI@zSvI8c0byI3aX6I/JLmL5Ff0/wYuU3zpUaxNMIELqUpc@47nhXPmmcjPvzjE2R/awtPUO52OzHdxg49M96J2OAmNrgJa6EougZ@JNjL9ZkUW1uZHFbpaf0otQhmjsSQ23vSI8b@fpIZfGmK0sAt8AokwddNGnPAg1reFcWwFJS63/8A) and other solutions. – Kelly Bundy Mar 27 '22 at 02:54
The fastest solution, while being much faster than the others is unfortunately far from being actually fast... 10 us for an ASCII string of 1000 character seems pretty big. An optimized C/C++ code should be easily >= 20 times faster in this case. I guess this is the limit of CPython – Jérôme Richard Mar 27 '22 at 03:12
You mean my benchmark? That's 9000 characters, not 1000. – Kelly Bundy Mar 27 '22 at 03:17
Indeed. I missed the size of the string ^^'' . But the expected speed up is still this hold. I tried with Numba and I got a faster implementation on big strings but many conversions are needed since operating with string is slow. In the end, 95% of the time is pure overhead although it is faster. The computing part of the Numba function takes <0.2 us to compute a 9000-sized string while translate takes 5 us. – Jérôme Richard Mar 27 '22 at 04:12
That's 45 gigacharacters per second. So it's using instructions to process multiple bytes at once? Are you using a lookup table like translate probably does or just adding 49 to each byte? – Kelly Bundy Mar 27 '22 at 04:29
I do just a branchless conditional add that is efficiently auto-vectorized by the JIT (it uses SIMD instructions like AVX2 to do so). The outcomes is the same than the translation table (ie. safe). – Jérôme Richard Mar 27 '22 at 04:57
@JérômeRichard for a large string the characters aren't allocated individually, they're held in a single buffer. The number of bytes in the buffer will depend on the size of the most complex Unicode value, for an all ASCII string it will only require one byte per char. I'm sure dealing with the dynamic nature of Unicode strings adds to the overhead of this operation in Python. – Mark Ransom Mar 27 '22 at 06:23
@MarkRansom You means for `join`? This is what I though first, but the above link of KellyBundy and tests on my machine show the opposite: CPython required 8 bytes per char and the overhead is exactly the same as a list of character object created from a generator (when the final string is created, it is 9 bytes per char required, and the final string takes 1 byte per char). 8 bytes is the size of an object reference in a list. Using `id` proves that the object pointer is the same and the character object are cached. Note that only 4 bytes would be needed for a unicode character. – Jérôme Richard Mar 27 '22 at 11:58
@JérômeRichard a list of single character strings is not the same as a single long string. On my Python 3.8 on Windows, `sys.getsizeof(['a']*1000)` shows a list taking 8 bytes per character while a string `sys.getsizeof('a'*1000)` takes 1 byte per character. Older Python versions would take 2 or 4 bytes per character in a string, but they optimized that. – Mark Ransom Mar 27 '22 at 15:55
1

Single-char strings outside latin1 aren't cached, `s = chr(256) * 2; print(s[0] is s[1])` says `False`. It was [suggested but rejected](https://bugs.python.org/issue31484). – Kelly Bundy Mar 27 '22 at 17:15
Sure! One is a list of reference over objects and the second is a big buffer (`char*`). I am not sure to understand you initial comment. To clarify my point: I through the references in lists of characters where referencing independent character objects, but this is not the case (due to caching). So `list('aba')` is a buffer of 3 reference (ie. pointers) of 8 bytes each pointing on 2 unique objects (`'a'` and `'b'` of 32 bytes each). `aba` is a string object (32 bytes) containing a raw buffer of 3 bytes (4 if 0-terminted). `''.join(e for e in 'aba')` create a temporary list of characters. – Jérôme Richard Mar 27 '22 at 17:23
@JérômeRichard when you access a character from a string, you create a brand-new single character string and it's efficient for Python to cache those because it's a common operation. I think our understanding of lists matches perfectly. I don't think any strings in Python are 0-terminated, it would be redundant. `''.join(e for e in 'aba')` doesn't create a temporary list, it creates a generator; for a temporary list you need `''.join([e for e in 'aba'])`. – Mark Ransom Mar 29 '22 at 05:02
@MarkRansom `''.join(e for e in 'aba')` **does** create a temporary list, as already pointed out earlier. Note in particular the reference to Raymond Hettinger's answer. And I think strings *are* null-terminated, at least a few places [here](https://github.com/python/cpython/blob/main/Include/cpython/unicodeobject.h) mention it and `print(''.__sizeof__())` tells me `49`, which I think are a few 64-bit ints and one terminating null-byte. – Kelly Bundy Mar 29 '22 at 05:24
@KellyBundy sorry, I missed the link you posted earlier about the temporary list and I'm still not sure I believe it. You could be right on null termination for strings, but it puzzles me why Python would bother since it couldn't possibly use it for it's own internal operations - nothing prevents you from having a null character in the middle of a string. More likely the extra byte is used to encode the character width of the string. – Mark Ransom Mar 29 '22 at 12:46
@MarkRansom See my comment with the "+/- 6%" link, where I tested it with up to a million characters and both the genexp and the listcomp version allocated over 9 MB. Maybe that'll convince you :-). After digging some more, I'm still pretty sure strings are null-terminated. The character width is encoded in [three *bits*](https://github.com/python/cpython/blob/v3.10.4/Include/cpython/unicodeobject.h#L172-L199) inside a struct [padded](https://github.com/python/cpython/blob/v3.10.4/Include/cpython/unicodeobject.h#L214-L216) to four bytes. – Kelly Bundy Mar 29 '22 at 19:10

when should I use for or map or generator

1 Answers1

Benchmark