1 billion rows DW to DM

Question

I have a design/performance question.

I have this next table.

CREATE TABLE [dbo].[DW_Visits_2016](
    [VisitId] [int] NOT NULL,
    [UserId] [int] NOT NULL,
    [VisitReferrer] [varchar](512) NULL,
    [VisitFirstRequest] [varchar](255) NOT NULL,
    [VisitAppName] [varchar](255) NULL,
    [VisitCountry] [varchar](50) NULL,
    [VisitDate] [smalldatetime] NOT NULL,
    [VisitMins] [int] NOT NULL,
    [VisitHits] [int] NOT NULL,
    [EntryTag] [varchar](100) NOT NULL,
    [VisitCount] [int] NOT NULL,
    [VisitInitialDate] [datetime] NOT NULL,
    [AggregateType] [varchar](50) NULL,
    [MemberId] [int] NULL,
    [ServerName] [varchar](50) NULL,
    [BrowserUserAgent] [varchar](255) NULL,
    [LastModifiedDate] [smalldatetime] NULL,
    [Guid] [uniqueidentifier] NULL,
    [SessionId] [varchar](100) NULL,
    [IPAddress] [varchar](40) NULL,
 CONSTRAINT [PK_Visits] PRIMARY KEY NONCLUSTERED 
(
    [VisitId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
)

GO

SET ANSI_PADDING OFF
GO

ALTER TABLE [dbo].[Visits]  WITH CHECK ADD  CONSTRAINT [CK_Visits_VisitDate] CHECK  (([VisitDate]>='2016-01-01' AND [VisitDate]<'2017-01-01'))
GO

ALTER TABLE [dbo].[Visits] CHECK CONSTRAINT [CK_Visits_VisitDate]

And this same table for 2015 ... 2010.

Every table has around 150 million rows. So, combined we are talking about 1,050 million rows.

I received a requirement where BI people wants to have this combined on a single view (Something crazy like select * from all_visits).

Luckily they gave me some ‘where’ clauses, and some columns they don’t need, so the final result would be 6 columns and 20% of the rows (210 million rows), but nonetheless, a ‘view’ is just a stored query. Even though the box has 60GB of ram, it’s shared with many other databases.

Options I see:

Instead of a view… Creating the views as tables and move them to a dedicated box.
Create one view per year?
Switch all of this to mongodb or something like vertica?!
Any of the previous options combined with column stored indexes?

DWH "needs" enterprise edition. Enterprise edition supports clustered tables. In this case I would make a partition per either year or more - month. Then some denormalisation in the ETL process (referrer) and voila, the data is manageable AND smaller. You seriously do not believe something trivial as one billion rows is challenging? I throw around 30 billion row tables at times. And those survive not even a month under full load. — TomTom, Feb 11 '16 at 16:52
Still, this is way too broad for here. This is something a specialist has to look into - while looking at the requirements and limitations. We can not make architecture advice in a Q&A format and look in the mirror an know we did a good job. — TomTom, Feb 11 '16 at 16:53
TomTom thanks for your time and sorry to interrupt you. I do agree is too broad for here, but since the requirements and performance is quiet flexible, and this is just some testing data… I went ahead. Yes, I am running Enterprise Edition… “Enterprise edition supports clustered tables”?! Did you mean clustered views?! Tables 2010 to 2015 are in read-only mode… [DW_Visits_2016] gets insertions daily… I didn’t fully understand your advice, so first you say I should partition the table and then do an ETL? Wouldn’t the partition be part of the ‘transformation’ from the ETL — Chicago1988, Feb 11 '16 at 17:19
What is the version and edition of you SQL server? In such a case you should use partitioned table and columnstore index. this way you can handle huge data. Also you can get benefit from partition switching. I don't recommend to have separate table for each year. — FLICKER, Feb 11 '16 at 17:42

1 billion rows DW to DM

0 Answers0