Questions tagged [data-warehouse]

A data warehouse (DW) is a database specifically structured/designed to aid in querying, analyzing and reporting (generating reports out of) of current and historical data. DWs are central repositories of integrated data from one or more disparate sources. Basic difference between a data warehouse and a set of DB tables is how the data is organized/structured.

A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data used for querying, analyzing and reporting for decision-support purposes.

Data Mart is the access layer of the data warehouse. It serves a particular department like Marketing, HR, etc. Dedicated to business function and unit specifications, data marts make the information more focused and faster to find.

Some differences between Data mart and Data warehouse:

Data Warehouses have multiple subject areas with more detailed information. They are integrating all sources of data. Dimensional modelling is not necessary, but it feeds dimensional models
Data Marts usually hold one subject area with not that detailed information - often summarized. Concentrate on integrating information from one subject area or source system. Built on dimensional models like star-schema.

There are many products readily available that provide data warehousing capability e.g. MSAccess, Essbase (Hyperion, now Oracle), Cognos, Business Objects, MicroStrategy, ...

Basics of Data Warehousing:

Dimensional Modelling - Consist of the identify the measurements, or facts, that are given the context by their related dimensions. The grain of the fact table describes the level of detail at which the facts are recorded.

Main steps of relational modelling:

Choose the business process
Declare the grain
Identify the dimensions
Identify the fact

Online Analytical Processing (OLAP) and it's types (ROLAP, MOLAP, HOLAP, ...): Describes basics of the DB designs and pros/cons of each way. - A variety of different design patterns are used in a data warehouse environment. Some common approaches include: Normalized (5NF); DataVault; Anchor Modelling; Dimensional (5,6); other temporal (e.g. 6NF). - SQL: Describes how a Data Warehouse can be queried. Following is a list of basic keywords that every data warehouse developer must know: - JOIN - GROUPBY

At a high level the Data Warehousing can be divided into:

Tools (IBM Cognos, Microsoft Business Intelligence, Oracle Business Intelligence Enterprise, dition(OBIEE), Business Objects Enterprise XI, Jaspersoft, Talend Open studio, Pentaho, Qlikview etc) readily available and how to use them. Used for small to medium sized data sets. This usually requires [at least] knowledge of tool's:
- data model and
- user interface
Building your own data warehouse for specific usecases. Used when dealing with really huge data sets (e.g. the data collected by Google, Yahoo, Facebook or a couters/performance-management-data from a large telecommunication network. This usually requires [at least] knowledge of:
- scalability, high availability and clustering concepts.
- data warehouse (schema, queries, data model, ...) design.
- available databases (Oracle, Clustra, Greenplum, MySQL, DB2, ...)
- problem domain (implicit).
- relevant GUI/UI (SWING, JSP, ...) and business logic (J2EE, C++, ...) technologies

2778 questions

votes

2 answers

Star schema, normalized dimensions, denormalized hierarchy level keys

Given the following star schema tables. fact, two dimensions, two measures. # geog_abb time_date amount value #1: AL 2013-03-26 55.57 9113.3898 #2: CO 2011-06-28 19.25 9846.6468 #3: MI 2012-05-15 94.87 4762.5398 #4: …

asked Feb 18 '16 at 04:05

jangorecki

16,384
4
79
160

votes

4 answers

How to calculate Azure SQL Data Warehouse DWU?

I am analyzing Azure SQL DW and I came across the term DWU (Data warehouse units). The link on Azure site only mentions a crude definition of DWU. I want to understand how DWU is calculated and how should I scale my system accordingly. I have also…

azure data-warehouse azure-synapse

asked Jan 25 '16 at 11:38

ViSu

votes

2 answers

Is star schema still necessary for a big-data-warehouse?

I am designing a new hadoop-based data warehouse using hive and I was wondering whether the classic star/snowflake schemas were still a "standard" in this context. Big Data systems embrace redundancy so that fully normalized schemas have usually…

hadoop hive data-warehouse

asked Jun 13 '15 at 22:07

Nicola Ferraro

4,051
5
28
60

votes

2 answers

SSIS - OLE DB Destination - Table or Views load vs. Fast-load

From what I've read: Table or Views data access mode commits each row at a time as a transaction. Thus, making a package transferring 5 million rows take long (30+ minutes). Fast-load data access mode allows one to specify the batch rows and the…

ssis sql-server-2008-r2 data-warehouse

asked Jan 22 '13 at 19:29

Ramon Gonzalez

votes

4 answers

Calendar table for Data Warehouse

For my data warehouse, I am creating a calendar table as follows: SET NOCOUNT ON DROP Table dbo.Calendar GO Create Table dbo.Calendar ( CalendarId Integer NOT NULL, DateValue Date NOT NULL, DayNumberOfWeek …

sql-server data-warehouse

asked Jul 29 '09 at 17:36

Raj More

47,048
33
131
198

votes

1 answer

How to create history fact table?

I have some entities in my Data Warehouse: Person - with attributes personId, dateFrom, dateTo, and others those can be changed, e.g. last name, birth date and so on - slowly changing dimension Document - documentId, number, type Address -…

sql data-warehouse fact-table datahistory

asked May 21 '11 at 07:54

Argnist

votes

1 answer

Generating star schema in hive

I am from SQL Datawarehouse world where from a flat feed I generate dimension and fact tables. In general data warehouse projects we divide feed into fact and dimension. Ex: I am completely new to Hadoop and I came to know that I can build data…

hadoop hive data-warehouse dimensional-modeling fact

asked Mar 28 '17 at 12:59

Zerotoinfinity

6,290
32
130
206

votes

2 answers

What should I have in mind when building OLAP solution from scratch?

I'm working for a company running a software product based on a MS SQL database server, and through the years I have developed 20-30 quite advanced reports in PHP, taking data directly from the database. This has been very successful, and people are…

sql-server ssis data-warehouse olap business-intelligence

asked Sep 13 '10 at 22:53

Søren Haagerup

votes

4 answers

Database architecture for millions of new rows per day

I need to implement a custom-developed web analytics service for large number of websites. The key entities here are: Website Visitor Each unique visitor will have have a single row in the database with information like landing page, time of day,…

mysql database scalability analytics data-warehouse

asked Aug 18 '10 at 07:07

Paras Chopra

4,029
4
21
19

votes

4 answers

Extrapolate daily historical values from a table that only records when a value changes (Postgresql 9.3)

I have a table that records a row for each time a score for a location has changed. score_history: id int PK (uuid auto incrementing int) happened_at timestamp (when the score changed) location_id int FK (the location that the value is for) score…

sql postgresql data-warehouse postgresql-9.3

asked Jul 03 '19 at 15:48

Nick

8,483
10
46
65

votes

3 answers

date/time dimension

I am designing a data warehouse and I have a sticky issue with time. The grain I need is hourly (to calculate aggregate counts of events per hour) and I also have to accommodate a shift pattern that does not conveniently fit inside a 24 hour period…

sql data-warehouse dimensions

asked Feb 08 '11 at 07:27

dfoster99

votes

4 answers

How Do I aggregate Data By Day and Still Respect Timezone?

We are currently using a summary table that aggregates information for our users on an hourly basis in UTC time. The problem we are having is that this table is becoming too large and slowing our system down immensely. We have done all the tuning…

sql database database-design data-warehouse

asked Aug 06 '10 at 22:30

Russ Bradberry

10,705
17
69
85

votes

4 answers

Database warehouse design: fact tables and dimension tables

I am building a poor man's data warehouse using a RDBMS. I have identified the key 'attributes' to be recorded as: sex (true/false) demographic classification (A, B, C etc) place of birth date of birth weight (recorded daily): The fact that is…

sql database-design data-warehouse olap

asked May 29 '10 at 07:24

morpheous

16,270
32
89
120

votes

4 answers

Is it possible to partially refresh a materialized view in Oracle?

I have a very complex Oracle view based on other materialized views, regular views as well as some tables (I can't "fast refresh" it). Most of the time, existing records in this view are based on a date and are "stable", with new record sets having…

oracle data-warehouse partitioning materialized-views

asked Nov 23 '09 at 14:07

Galghamon

2,012
18
27

votes

2 answers

OLAP - Calculate run-off triangles, sample data and cube included (PostgreSQL/Mondrian)

Reality description: We do have a list of projects. In every project there is a lot of accounts. You can do a lot of actions on every account. I do have the following dimensions and fact table defined (simplified): Dimensions and attributes: …

ssas mdx data-warehouse olap mondrian

asked Nov 13 '12 at 10:01

Tomas Greif

21,685
23
106
155

Prev 1 2 3

…

99 100 Next