4

I'm interested in sharding my websites user data across multiple servers.

For example, users will login from the same place. but the login script needs to figure out what server that users data resides on. So the login script would query the master registry for that user name, and it might return that it's on server B. The login script would then connect to server B and verify the username/password. Does that make sense? Is it normal to have something like a master registry to resolve where data resides?

also- I've searched but I haven't had much luck finding tutorials/information/strategies on sharding. If there are any online resources that you are aware of on the topic I would greatly appreciate it if you would share so that I may educate myself. Thanks!

Owen Blacker
  • 4,117
  • 2
  • 33
  • 70
  • I wrote a [blog post](http://blog.devlex.net/post/2012/04/20/Sharding-with-RavenDB.aspx) with using sharding across 3 servers. You can get the full source code and run it locally. Check it out and see what you think! – oleksii Jun 27 '12 at 18:51

2 Answers2

8

You should check the very informative site http://highscalability.com . Posts worth reading:

Generally you are following the right approach but this can get nasty quite fast if you need to do queries on more than one cluster - e.g. "your friends' recent posts" type queries.

Owen Blacker
  • 4,117
  • 2
  • 33
  • 70
max
  • 29,122
  • 12
  • 52
  • 79
1

One option you might want to consider: use a simple hash. For example, take the MD5 hash of the username, then treat the last 8 bytes of that as a long. Take that long mod (number of servers) and make that the server to put the data on. That way you don't need any central registry/configuration other than an ordered list of servers.

The disadvantage is that changing the number of servers involves moving all the data to the new "correct" location...

(There's also the matter that if one machine goes down, those users are stuffed - you'll want to consider having some sort of redundancy.)

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 1
    "The disadvantage is that changing the number of servers involves moving all the data to the new "correct" location..." Note that this can be dealt with using Consistent Hashing. – Dana the Sane Oct 21 '09 at 00:27