I have made a web scraper that downloads a bunch of PDFs. The script is basically a loop and downloads a PDF(~8 MB) per iteration. The total file size is estimated to be >300GB. I was thinking that instead of creating an instance with that much storage, why not put the PDFs in an S3 bucket as soon as they are downloaded.
I will be using a t2.xlarge ubuntu system. The loop is supposed to run for 2 weeks, so I believe it will be cheaper to use S3 bucket instead of buying extra storage for t2.
The thing is that the script downloads the PDFs in the /Downloads folder. I think I need to mount a bucket using s3fs? Then I will recursively copy the files in the Downloads folder and paste them in the mounted bucket, and then use rm
to delete everything in the \Downloads folder. Is this the way to go is there a more straightforward way?
Any help or documentation link would be appreciated! Thanks!
Related posts: