The simple method would be:
- Download the file
- Modify the file
- Upload the file
Attempting to read()
a 10+ GB file is not a good idea. Downloading it to disk using download_file()
will work much better. You can then modify it locally however you wish, and then upload the resulting file.
I recommend that such a script be run on an Amazon EC2 instance, or in an AWS Lambda function, so that it stays within AWS. This will be much faster rather than transferring the data across the Internet (and therefore lower cost, too).
If the extra column is merely a calculation based on the existing columns, then you could use Amazon Athena:
- Define a table based upon the format and location of the incoming file
- Use
CREATE TABLE AS
to select data from the incoming 'table' and output your desired data -- you can specify a location
for where to output the data
However, I suspect that Amazon Athena would produce multiple output files rather than one big file.