At present we send emails to subscribers on our Mailing List. This features a tracking pixel that maps to a PHP file that then does Insert data into our SQL Server database.
The current average for Email Tracking Pixel Inserts is about 80,000 inserts a day. And about 800,000 inserts equates to about 1GB of hard disk space, so 10 days 1GB of data.
As well as this we have other Inserts and Tracking Data getting inserted into the SQL Server database which also happens to be the same database that the websites uses. So for space, performance, licensing costs and horizontal scaling reasons etc I want to move this analytic tracking data away from the SQL Server DB + the fact that this analytic tracking data is not needed by the website so I want to move these write heavy inserts away so that the website DB is just that.
Table Structure at the moment
TrackingPixelId | UserId | Code | Medium | Source | DateViewed | SessionId
9109616 | 1234 | 'BULLETIN120115' | 'email' | 'BULLETIN120115' | {datetime} | bf7e2f801...
Column Info
TrackingPixelId : PK Integer auto increment
UserId: Integer
Code, Medium and Source : Strings/varchars
DateViewed: DateTime eg 2015-01-13 06:18:24.920
SessionId : eg fa5cac87896e1c7b423051fffdb836a6
Code and Source are essentially the same thing, the Unique ID of the Email Mail Out that was opened.
So in terms of how the data will be reported we will be looking at volume of opened emails. So Daily, Weekly, Monthly and Yearly reports and we won't need the reports to be generated instantly probably loads of writes per day but just a hand full of reads but probably wanting the most recent data first.
So considering all of these factors what would be the best shard key?