0

I’ve been working on a hobby project that’s a django react site that give analytics and data viz for texts. Most likely will host on AWS. The user uploads a csv of texts. The current logic is that they get stored in the db and then when the user calls the api it runs the analytics on them and sends the analytics. I’m trying to decide whether to store the raw text data (what I have now) or run the analytics on the texts once when they're uploaded and then discard them, only storing the analytics.

My thoughts are:

Raw data:

  • pros:

    • changes to analytics won’t require re uploading
    • probably simpler db schema
  • cons:

    • more sensitive data (not sure how safe it is in a django db on AWS, not sure what measures I could put in place to protect it more)
    • more data to store (not sure what it would cost to store a lot of rows of texts)

Analytics:

  • pros:

    • less sensitive, less space
  • cons:

    • if something goes wrong with the analytics on the first run (that doesn’t throw an error), then they could be inaccurate and will remain that way
  • 1
    Keep both. Use AWS RDS to store the data. If you have a lot of data, then store text in S3. But I believe that AWS RDS should be enough (as for hobby project). – pplonski Mar 06 '21 at 21:52
  • Would it be enough if I actually let people use this? A chat can have hundreds of thousands of texts. And trying to decide if it's worth the security risk of storing people texts @pplonski – cavalier 27 Mar 16 '21 at 15:15

0 Answers0