0

I'm trying to figure out a way in my Flask application to store the multiple csvs that are processed by each thread continuously inside a buffer before uploading it to a Mongo database. The reason I would like to use the buffer is to guarantee some level of persistence and proper handling of errors (in case of network failure, I want to try uploading the csv into Mongo again).

I thought about using a Task Queue such as Celery with a message broker (rabbitmq), but wasn't sure if that was the right way to go. Sorry if this isn't a question suitable for SO -- I just wanted clarification on how to go about doing this. Thank you in advance.

1 Answers1

0

Sounds like you want something like the linux tail command. Tail prints each line of file as soon as it is updated. I'm assuming this csv file is generated by a seperate program that is running at the same time. See How can I tail a log file in Python? on how to implement tail in python.

Note: You might be better off dumping the CSV's in batches it won't be realtime but if thats not important it'll be more efficient

Tasty213
  • 395
  • 2
  • 10
  • Hey, thanks for the response. If I were to go about dumping CSV's in batches, where/how should I store the CSVs before dumping them into MongoDB? – jquery2000 Apr 20 '21 at 03:48
  • I would say save them into a folder in the home directory. Save them each as the timestamp for the filename so each one is unique. And then use a Cron job executed once an hour (or however frequently is needed) to process the files. – Tasty213 Apr 20 '21 at 13:07
  • Wouldn't saving to a folder in a directory be slow since it's writing to disk? Would using a persistent queue like Redis be a good idea in this case? – jquery2000 Apr 21 '21 at 03:47
  • It depends on how much data your writing, what speed you need and how much effort your willing to put in. If it's a low data rate and the speed doesn't matter it's not really worth implementing a queue. But if it's a high-frequency data rate it might be worthwhile. – Tasty213 Apr 21 '21 at 19:13