0

The concept of sharding on SQL azure is one of the top recommended options to get over the 50Gb DB size limit, it has at the moment. A key strategy in sharding is to group related records called atomic units together in a single shard , so that the application needs to only query a single SQL azure instance to retrieve the data.

However in applications such as Social networking Apps, grouping a atomic unit in a single shard is not trivial, due to the inter-connectivity of entities and records. what could be a recommended approach based on such a scenario?

Also in a sharded DB , what primary keys should be used for the tables ? Big Int or GUID. i currently use BIGINT Identity columns but if the data was to be merged for some reason this would be a problem due to conflicts between the values in different shards. i have heard some people recommend GUID's (UniqueIdentifier) but i'm wary on how this could affect performance. Indexing On-premise SQL servers with UniqueIdentifier columns is not possible, and i wonder how SQL azure implements similar strategies if i were to employ a UniqueIdentifier column.

Azwaan
  • 47
  • 1
  • 9

1 Answers1

0

For a social networking app, I'd presonally forgo using SQL and instead leverage a noSQL solution such as MongoDB or Azure Table Storage. These non-normalized but in-expensive systems allow you to create multiple entity datasets that are customized to your various indexing needs.

So instead of having something like... User1 -< relationshiptable -< User2

You'd instead have tables like Users User1's Friends User2's Friends

If Users 1 and 2 are both friends, then you'd have two entries to define that relationship, not one. But if makes retrieving a list of a specific user's friends trivial. It also now opens you up for executing tasks in parallel, by searching multiple index tables at a time.

This process scales extremely well, but does require that you invest more time in how the relationships are maintained. Admittedly, this is a simiplied example. Things get much more complex when you start discussing tasks like searching across your entire user base.

BrentDaCodeMonkey
  • 5,493
  • 20
  • 18
  • i'm aware of the NOSQL based options and even Azure table storage however this would certainly increase dev times considerably, so we are stuck with a relational db approach at the moment. – Azwaan Feb 11 '11 at 15:55
  • Then I'd explore either hosting the RBDMS elsewhere (Amazon, Rackspace, etc...). This would allow you to do larger DB's on more powerful VM's. Just make sure you put in a cache layer to help control costs and increase performance. Personally, I'd still explore the noSQL route. Its the solution that's going to be the best for you long term. Even if you only do it in a mixed way (indexes in SQL Azure, data stores in Azure Storage). – BrentDaCodeMonkey Feb 11 '11 at 16:37
  • Assuming that i'd go with NOSQL DB running on Azure, how would u rate a Graph DB (considering that graph db's have many features tailored at social netowrking type scenarios) such as Neo4J or Sones GraphDB against Windows table storage? – Azwaan Feb 23 '11 at 12:15
  • I'll try to check into it and let you know. One of the other Azure MVP's is working on something that may be similar to your project. So it might be helpful if I can connect you two. :) – BrentDaCodeMonkey Feb 23 '11 at 14:23