The concept of sharding on SQL azure is one of the top recommended options to get over the 50Gb DB size limit, it has at the moment. A key strategy in sharding is to group related records called atomic units together in a single shard , so that the application needs to only query a single SQL azure instance to retrieve the data.
However in applications such as Social networking Apps, grouping a atomic unit in a single shard is not trivial, due to the inter-connectivity of entities and records. what could be a recommended approach based on such a scenario?
Also in a sharded DB , what primary keys should be used for the tables ? Big Int or GUID. i currently use BIGINT Identity columns but if the data was to be merged for some reason this would be a problem due to conflicts between the values in different shards. i have heard some people recommend GUID's (UniqueIdentifier) but i'm wary on how this could affect performance. Indexing On-premise SQL servers with UniqueIdentifier columns is not possible, and i wonder how SQL azure implements similar strategies if i were to employ a UniqueIdentifier column.