My data is like belows : I have about 1,000,000 gene sequence data ,but some of them are very short ,hundreds of characters , but some of them is so large that its BSON size has already exceeds the 16M per-document size limit of MongoDB ,up to about 10,000,000 charactors for one sequence. So ,I am considering using GridFS to store these sequence as a file.So , I fall into a delima:
solution 1:store all gene sequence as files using GridFS, no matter they are small or big.
solution 2:only store very-big-size gene sequence as a file using GridFS,and store small-size gene sequences as normal document. But it leads to another problem, query is no longer simple , because gene sequences is stored in two different ways ,for every query ,I have to query both of them .
I am new for MongoDB,So , Many of my thoughts looks ridiculous.But I really need your help.