This isn't a problem with chained calls generally, but in this case it's because you maintain:
filedata = f.read()
That original reference around.
So:
filedata = filedata.replace(',', ' ').replace('-', ' ').replace('_', ' ')
The original str
read from the file has to stay in memory along with each subsequent .replace
result until the assignment happens at the end, where its reference count finally reaches 0. A single replace
, when the operation doesn't change the resulting size of the string, will require twice as much memory, because the method utilizes a reference to the original string and the new string at the same time. So at the point where you are on your second replace, you would have to have the original string, the the once-replaced string, and the new, twice-replaced string in memory.
On the other hand,
filedata = filedata.replace(',', ' ')
filedata = filedata.replace('-', ' ')
filedata = filedata.replace('_', ' ')
Here, each step requires at most 2 times the amount of memory of the original string, since the assignment causes the reference count of the original to be garbage collected before going on to a subsequent .replace
, and importantly, the original doesn't stay in memory.
If what I say is true, then the following should work:
filedata = f.read().replace(',', ' ').replace('-', ' ').replace('_', ' ')
But the pythonic way to do this is to avoid .replace
altogether in this instance, because you are doing multiple, single replacements.
For that, you should use str.translate
.
filedata = f.read()
table = {ord(','): ' ', ord('-'): ' ', ord('_'): ' '}
filedata = fildata.translate(table)
Here is some empirical evidence:
import tracemalloc
tracemalloc.start()
result = "abcdefghij"*1_000_000
result = (
result.replace('a', '*')
.replace('b', '*')
.replace('c', '*')
)
size, peak = tracemalloc.get_traced_memory()
print(f"{size=}, {peak=}")
del result
tracemalloc.reset_peak()
result = "abcdefghij"*1_000_000
result = result.replace('a', '*')
result = result.replace('b', '*')
result = result.replace('c', '*')
size, peak = tracemalloc.get_traced_memory()
print(f"{size=}, {peak=}")
del result
tracemalloc.reset_peak()
result = ("abcdefghij"*1_000_000).replace('a', '*').replace('b', '*').replace('c', '*')
size, peak = tracemalloc.get_traced_memory()
print(f"{size=}, {peak=}")
The above outputs what I would expect:
size=10000625, peak=30000723
size=10000681, peak=20000730
size=10000681, peak=20000730