14

Given a very large string. I would like to process parts of the string in a loop like this:

large_string = "foobar..."
while large_string:
    process(large_string.pop(200))

What is a nice and efficient way of doing this?

Machavity
  • 30,841
  • 27
  • 92
  • 100
Martin Flucka
  • 3,125
  • 5
  • 28
  • 44

4 Answers4

14

You can wrap the string in a StringIO or BytesIO and pretend it's a file. That should be pretty fast.

from cStringIO import StringIO
# or, in Py3/Py2.6+:
#from io import BytesIO, StringIO

s = StringIO(large_string)
while True:
    chunk = s.read(200)
    if len(chunk) > 0:
        process(chunk)
    if len(chunk) < 200:
        break
Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • If he want to consume the string from the end this does not work. – schlamar Jun 15 '12 at 12:34
  • @ms4py fortunately the order of the chunks does not matter at all for my task – Martin Flucka Jun 15 '12 at 12:47
  • 1
    @ms4py: true. In that case, I'd slice the string up into a list and iterate over it in reverse: `[large_string[i:i+200] for i in xrange(0, len(large_string), 200)]` – Fred Foo Jun 15 '12 at 12:48
  • 2
    @larsmans: Or, you could use the buffer's seek method to read the last _n_ bytes: `s.seek(-200, 2); chunk = s.read()`... – Joel Cornett Jun 15 '12 at 12:51
  • 2
    You don't need Py3 for `io.StringIO` - it exists from 2.6. – lvc Jun 15 '12 at 13:09
  • One of the benefits of `pop` is that it discards the elements it returns, releasing the memory they occupied. `StringIO` does not discard the parts of the string that were already read. Is there a way to get this aspect of the `pop` functionality for strings? – Joe Mar 11 '13 at 08:53
  • @Joe: from what I know of CPython internals, I don't think any solution will partially deallocate the string and retain linear time complexity for taking slices off it. – Fred Foo Mar 11 '13 at 10:32
13

you can convert the string to a list. list(string) and pop it, or you could iterate in chunks slicing the list [] or you can slice the string as is and iterate in chunks

dm03514
  • 54,664
  • 18
  • 108
  • 145
2

You can do this with slicing:

large_string = "foobar..."
while large_string:
    process(large_string[-200:])
    large_string = large_string[:-200]
schlamar
  • 9,238
  • 3
  • 38
  • 76
  • 5
    This is pretty wasteful. Not only because it does the slicing twice, but because it uses an O(n²) time algorithm. – Fred Foo Jun 15 '12 at 12:32
1

To follow up on dm03514's answer, you can do something like this:

output = ""
ex = "hello"
exList = list(ex)
exList.pop(2)
for letter in exList:
    output += letter

print output # Prints 'helo'
Colonel_Old
  • 852
  • 9
  • 15