0

Let's say I have the following list of dicts:

data = [
    {"a": 1},
    {"a": 1, "b": 2},
    {"a": 1, "b": {"c": 3, "d": 4}},
]

And a simple model with a file field backed by S3:

from storages.backends.s3boto3 import S3Boto3Storage

class S3Storage(S3Boto3Storage):
    bucket_name = "my-bucket"

class Output(models.Model):
    file = models.FileField(upload_to="uploads/")

I want to generate a JSON Lines file and upload that to Output.file. This is what I have so far:

from tempfile import TemporaryFile

with TemporaryFile(mode="w+") as tmp_file:
    for record in data:
        json.dump(record, tmp_file)
        tmp_file.write("\n")

    tmp_file.seek(0)
    bytes_file = tmp_file.read().encode()
    content = ContentFile(bytes_file)

    output = Output()
    output.file.save("data.jsonl", content)

This works fine but seems inefficient, specifically reading the entire temp file and encoding it. Is there a more performant way to do this, perhaps by writing bytes to the file originally so I can avoid the following lines:

tmp_file.seek(0)
bytes_file = tmp_file.read().encode()

Or are there other areas for speed / memory optimization?

Johnny Metz
  • 5,977
  • 18
  • 82
  • 146

0 Answers0