-1

I am want to be able to store per user multiple types of binary files. Could be pdf, photos or very small video ~2MB.
I have in mind 2 approaches:

  1. Use MySQL and have a BLOB column in a table and add in the column these different types of files.
  2. Use MySQL to store metadata about the binary files but store the actual files in the filesystem.

I think (1) is simpler to implement but (2) allows for easier access of the files from everywhere e.g. even for download links.

What I was not sure though is if we can consider the binary files as documents and hence using e.g. Cassandra or any other NoSQL store is a better choice. What are the downsides of treating the binary files as "documents"?

Jim
  • 3,845
  • 3
  • 22
  • 47

1 Answers1

2

The downside for this approach with Cassandra, is depending on the table structure, your partitions could get too big. The prevailing wisdom is to keep your partition sizes < 100MB. If this table is partitioned on something unique like video_id, then each movie is its own partition, and that shouldn't be a problem.

But if there's a category or playlist system where multiple videos are getting stored in the same partition, that could exceed that limit and read performance would degrade.

tl;dr;

Regardless of database choice, option #2 is the best practice. Storing binary files in a database almost always leads to problems (corruption, slow reads, higher ops maintenance). Storing the metadata or file location data in the DB, and using that to reference the binary files is a much friendlier solution with fewer opportunities for failure.

Aaron
  • 55,518
  • 11
  • 116
  • 132
  • Thank you for the help! Out of curiosity why do we have corruption issues when storing binary files in DB? – Jim Feb 06 '23 at 08:45
  • In regards to Cassandra and your comment about `depending on the table structure` the table will be created from scratch if that would help. I mean I wouldn't add a new column on an existing table – Jim Feb 06 '23 at 09:24
  • @Jim I've simply heard of that happening, that's all. Also, my comment regarding the table structure, was in making sure that the PRIMARY KEY definition is designed so that not too many videos end up on the same partition. That's a normal concern with Cassandra, but as videos are large, it requires extra care. – Aaron Feb 06 '23 at 14:07