Let's say I have the following list of dicts:
data = [
{"a": 1},
{"a": 1, "b": 2},
{"a": 1, "b": {"c": 3, "d": 4}},
]
And a simple model with a file field backed by S3:
from storages.backends.s3boto3 import S3Boto3Storage
class S3Storage(S3Boto3Storage):
bucket_name = "my-bucket"
class Output(models.Model):
file = models.FileField(upload_to="uploads/")
I want to generate a JSON Lines file and upload that to Output.file
. This is what I have so far:
from tempfile import TemporaryFile
with TemporaryFile(mode="w+") as tmp_file:
for record in data:
json.dump(record, tmp_file)
tmp_file.write("\n")
tmp_file.seek(0)
bytes_file = tmp_file.read().encode()
content = ContentFile(bytes_file)
output = Output()
output.file.save("data.jsonl", content)
This works fine but seems inefficient, specifically reading the entire temp file and encoding it. Is there a more performant way to do this, perhaps by writing bytes to the file originally so I can avoid the following lines:
tmp_file.seek(0)
bytes_file = tmp_file.read().encode()
Or are there other areas for speed / memory optimization?