0

This is not a technical question, although I need help. For a school project, I set up a mongodb sharded cluster. Now I have to do a video that demonstrates sharding. I could run a program that fills up the first shard and then show that, when needed, data are balanced over the second shard. However, this would take a very very long time. I thought about running a command that would calculate the available disk space of a mongodb sharded cluster (with one shard, then with two shards) but there is no such command.

Any creative help would be great !

EDIT: A good way is to set the chunck size to something really low, sharding will be shown quick enough with a script filling up the database

rmonjo
  • 2,675
  • 5
  • 30
  • 37
  • What are you trying to do? What is the point of getting total space in a clsuter? There is no real reason to your question – Sammaye Feb 28 '13 at 13:55
  • the only reason of my question is to be advised of a good way to demonstrate the scalability provided by sharding. Have you even read the question ? – rmonjo Feb 28 '13 at 14:06
  • Yes but I didn't get that idea from your question, measuring the total disk size on a cluster wont prove nothing – Sammaye Feb 28 '13 at 14:36
  • WHat you wanna do is do a set of tests that balances chunnks to the ratio of insertions into your database and check that MongoDB correctly balances them across the cluster. You then want to write down your observations about a good shard key and how a bad shard key ruins the scaling and etc etc etc, atm you are looking into an area that will prove nothing about what your asking about – Sammaye Feb 28 '13 at 14:39
  • did I mention shard key ? Just want to show storage scalability. That would be proven if I could get the total disk space available within a cluster. – rmonjo Feb 28 '13 at 16:06
  • Read up about sharding, the shard key is vital to understanding storage distribution and storage distribution is key about writes and writes is key to sharding, personally it seems that you don't truly understand the point of sharding if you are factoring out the shard key in understanding scalability – Sammaye Feb 28 '13 at 16:11
  • In what sense? What does showing the total space in a cluster prove about storage scalability? I am so confused – Sammaye Feb 28 '13 at 16:15
  • Ok. I start with 2 shards, 8Gb each. Show total disk space in the cluster: look I have 16Gb of storage ! Now lets add a new 8Gb shard. Oh I now have 24Gb ! I have storage scalability ! That's what I have to show – rmonjo Feb 28 '13 at 16:27
  • about the shard key, I already choose the optimal one for each of my collections, that's not the point here – rmonjo Feb 28 '13 at 16:30
  • Hmm ok that ain't storage scalability but I see what you mean there, you have to show that servers can be added to the cluster. Of course it doesn't prove scalability because there are factors which means that chunks are not sent to that new server so that extra 8GB does not necessarily need to be used by MongoDB. In fact there are times where MongoDB will actually ignore some servers due to patterns – Sammaye Feb 28 '13 at 16:32
  • Sorry about that, the chat button is too close to the textbox, anyway considering your last comment, providing it is a good shard key your test might work, might, still be unreliable, but might. – Sammaye Feb 28 '13 at 16:45
  • Curious to here why this might be unreliable... – rmonjo Feb 28 '13 at 16:47
  • As I said the shard key might mean that this extra space is not used when you expect/need it as such adding this new server is useless to the general scaling of the cluster because it is not available when needed. – Sammaye Feb 28 '13 at 16:49
  • Ok. Say I have one collection, each object within it is identified by an unique id (might be a string, might be an int). After a month, the only shard I started with is 90% full. I add a new shard. Coming back after another month, shard1 is 95% full, shard2 60%. I really don't see what could go wrong in such scenarios – rmonjo Feb 28 '13 at 16:54
  • Hmmm if your shard key allows it then that would be the case, as I said it depends on a lot of variables it is not a dead certainty, hence the test is not too reliable since the scenario of writing to the next shard in line is not always a certainty – Sammaye Feb 28 '13 at 17:35
  • total storage is pretty irrelevant - it's total RAM that's important to performance, plus total I/O bandwidth. If you already have a sharded cluster you can just do db.stats() or db.collection.stats() on one of your sharded collections and that will show how your documents and storage are distributed. – Asya Kamsky Mar 01 '13 at 00:39

1 Answers1

1

You might find this open source project helpful:

MongoDB Sharding Visualizer

The source is here: https://github.com/10gen-labs/shard-viz

Asya Kamsky
  • 41,784
  • 5
  • 109
  • 133