One of our data sources sends a feed with an aggregate of data per day. A periodic snapshot. For example:
shop, day, sales
bobs socks, 2019-01-01, 45,
bobs socks, 2019-01-02, 50,
bobs socks, 2019-01-03, 10,
janes coats,2019-01-01, 500,
janes coats,2019-01-02, 55,
janes coats,2019-01-03, 100
I know of two ways to model this in a data vault raw vault:
Multi-Active Satellite
Here we allow each satellite to have multiple rows per hub key.
create table dbo.HubShop (
ShopName nvarchar(50) not null,
primary key pk_HubShop (ShopName)
)
create table dbo.SatDailyShopSales (
ShopName nvarchar(50) not null,
SalesDate date not null,
SalesAmount money not null,
LoadTimestamp datetime2(7) not null,
primary key pk_SatDailyShopSales (ShopName, SalesDate, LoadTimestamp)
)
This is easy to implement but we now have a bi-temporal element to the satellite.
Snapshot Hub
create table dbo.HubShop (
ShopName nvarchar(50) not null,
primary key pk_HubShop (ShopName)
)
create table dbo.HubSnapshot (
SalesDate date not null,
primary key pk_HubSnapshot (SalesDate)
)
create table dbo.LinkDailyShopSnapshot (
LinkDailyShopSnapshotHash binary(32) not null,
ShopName nvarchar(50) not null,
SalesDate date not null,
primary key pk_LinkDailyShopSnapshot (LinkDailyShopSnapshotHash)
)
create table dbo.SatDailyShopSales (
LinkDailyShopSnapshotHash binary(32) not null,
SalesAmount money not null,
LoadTimestamp datetime2(7) not null,
primary key pk_SatDailyShopSales (LinkDailyShopSnapshotHash, LoadTimestamp)
)
This second solution adds an extra hub which just stores a list of dates and a link for the intersection between date and shop.
The second solution feels cleaner but requires more joins.
Which is the correct model? Are there any better solutions?