1

I have a Python iterator backed by a DB query (a MongoDB cursor in this case). I'm trying to write its contents as a text file on S3, using boto.

The simplest way to do this is to concatenate everything into a string and call key.set_contents_from_string. However, this won't work well for large amounts of data (possibly 1GB+).

s = ""
for entry in entries:
  s += entry
k.set_contents_from_string(s)

Ideally, I'd use key.open_write() so I can write each entry to S3 as I iterate... but that function isn't yet implemented by boto.

k.open_write()
for entry in entries:
  k.write(entry)

How can I work around this? Is there perhaps a way to wrap an iterator to behave like a file object, so that I could use key.send_file?

nickbaum
  • 583
  • 4
  • 11
  • I think the main reason open_write isn't currently implemented is that S3 requires you to send a Content-Length header in the request and there is no way to know that value in this situation. Do you know the total size of the data you will be writing? – garnaat Apr 13 '12 at 11:50
  • Thanks for the details, gamaat. I don't know the total size, as I'm reading it as I go from the DB. One way to implement this might be for boto to buffer the writes, and to use S3's multi-part upload to send them in chunks (with a known size). – nickbaum Apr 18 '12 at 19:04
  • Possibly. The minimum size for a part is 5MB and you can have as many as 1024 parts. 5MB may not be too big to cache in memory. Depends on your hardware, really. – garnaat Apr 18 '12 at 22:44
  • Gotcha. I'm running this on Heroku for now, so the hardware spec is unknown. This is also why I don't just write it to a file and read from there. So from what you say, there's really no way to write 5MB+ of DB data to S3 using boto without loading at least 5MB in memory? – nickbaum Apr 19 '12 at 23:44
  • If you don't know the size up front, there is no way to write a Content-Length header for the PUT request. And S3 does not support chunked transfer encoding. So, it's not really a boto thing, it's an S3 thing. – garnaat Apr 20 '12 at 13:32

0 Answers0