Questions tagged [data-warehouse]

A data warehouse (DW) is a database specifically structured/designed to aid in querying, analyzing and reporting (generating reports out of) of current and historical data. DWs are central repositories of integrated data from one or more disparate sources. Basic difference between a data warehouse and a set of DB tables is how the data is organized/structured.

A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data used for querying, analyzing and reporting for decision-support purposes.

Data Mart is the access layer of the data warehouse. It serves a particular department like Marketing, HR, etc. Dedicated to business function and unit specifications, data marts make the information more focused and faster to find.

Some differences between Data mart and Data warehouse:

  • Data Warehouses have multiple subject areas with more detailed information. They are integrating all sources of data. Dimensional modelling is not necessary, but it feeds dimensional models
  • Data Marts usually hold one subject area with not that detailed information - often summarized. Concentrate on integrating information from one subject area or source system. Built on dimensional models like star-schema.

There are many products readily available that provide data warehousing capability e.g. MSAccess, Essbase (Hyperion, now Oracle), Cognos, Business Objects, MicroStrategy, ...

Basics of Data Warehousing:

  • Dimensional Modelling - Consist of the identify the measurements, or facts, that are given the context by their related dimensions. The grain of the fact table describes the level of detail at which the facts are recorded.

Main steps of relational modelling:

  • Choose the business process
  • Declare the grain
  • Identify the dimensions
  • Identify the fact

Online Analytical Processing (OLAP) and it's types (ROLAP, MOLAP, HOLAP, ...): Describes basics of the DB designs and pros/cons of each way. - A variety of different design patterns are used in a data warehouse environment. Some common approaches include: Normalized (5NF); DataVault; Anchor Modelling; Dimensional (5,6); other temporal (e.g. 6NF). - SQL: Describes how a Data Warehouse can be queried. Following is a list of basic keywords that every data warehouse developer must know: - JOIN - GROUPBY

At a high level the Data Warehousing can be divided into:

  • Tools (IBM Cognos, Microsoft Business Intelligence, Oracle Business Intelligence Enterprise, dition(OBIEE), Business Objects Enterprise XI, Jaspersoft, Talend Open studio, Pentaho, Qlikview etc) readily available and how to use them. Used for small to medium sized data sets. This usually requires [at least] knowledge of tool's:
    • data model and
    • user interface
  • Building your own data warehouse for specific usecases. Used when dealing with really huge data sets (e.g. the data collected by Google, Yahoo, Facebook or a couters/performance-management-data from a large telecommunication network. This usually requires [at least] knowledge of:
    • scalability, high availability and clustering concepts.
    • data warehouse (schema, queries, data model, ...) design.
    • available databases (Oracle, Clustra, Greenplum, MySQL, DB2, ...)
    • problem domain (implicit).
    • relevant GUI/UI (SWING, JSP, ...) and business logic (J2EE, C++, ...) technologies
2778 questions
9
votes
0 answers

What is the actual difference between Data Warehouse & Big Data?

I know what is Data Warehouse & what is Big Data. But I am confused with Data Warehouse Vs Big Data. Both are same with different names or both are different(Conceptually & Physically).
Aditya
  • 2,299
  • 5
  • 32
  • 54
9
votes
2 answers

Good place to start learning data warehousing?

I am interested in learning more about data warehousing. I see terms like "dimension", "snowflake schema" and "star schema" thrown about. Where would one start in learning about this stuff? Are there good books or Internet resources? ETL is in this…
Jordie
  • 899
  • 2
  • 11
  • 20
8
votes
2 answers

Implementing roleplaying dimension with friendly attribute names in SSAS?

I have a fact table that references my date dimension as both a Form Date and Finances Date. Hence the date dimension is roleplaying two different roles. This works fine, except that I can't roleplay the names of the Year or Month column. I would…
AaronLS
  • 37,329
  • 20
  • 143
  • 202
8
votes
3 answers

What is best practice for representing time intervals in a data warehouse?

In particular I am dealing with a Type 2 Slowly Changing Dimension and need to represent the time interval a particular record was active for, i.e. for each record I have a StartDate and an EndDate. My question is around whether to use a closed…
snth
  • 5,194
  • 4
  • 39
  • 48
8
votes
2 answers

Anything similar to MySQL Proxy for PostgreSQL?

I am looking for something similar to MySQL Proxy. The purpose is to modify incoming queries on the server. I am not looking for alternative ways to achieve the same. My best guess at the moment is to modify GridSQL, but this adds complexity and it…
David
  • 4,786
  • 11
  • 52
  • 80
8
votes
3 answers

What's the Grain in the context of DW

According to The Data Warehouse Toolkit by Kimball "The grain must be declared before choosing dimensions or facts because every candidate dimension or fact must be consistent with the grain." I'm so confused about this concept .Could some one…
Anyname Donotcare
  • 11,113
  • 66
  • 219
  • 392
8
votes
2 answers

Data Warehousing arbitrary fields

In our application, we support user-written plugins. Those plugins generate data of various types (int, float, str, or datetime), and those data are labeled with bunches of meta-data (user, current directory, etc.) as well as three free-text fields…
bukzor
  • 37,539
  • 11
  • 77
  • 111
8
votes
3 answers

Oracle aggregation function to allocate amount

Suppose I have 2 tables T1 and T2 as follows T1: bag_id bag_type capacity ------|--------|-------- 1 A 500 2 A 300 3 A 100 4 B 200 5 B 100 T2: item_type…
asinkxcoswt
  • 2,252
  • 5
  • 29
  • 57
8
votes
5 answers

True or False: Good design calls for every table to have a primary key, if nothing else, a running integer

Consider a grocery store scenario (I'm making this up) where you have FACT records that represent a sale transaction, where the columns of the Fact table include SaleItemFact Table ------------------ CustomerID ProductID Price DistributorID …
Chad
  • 23,658
  • 51
  • 191
  • 321
8
votes
1 answer

SQL/SSIS DataWareHouse Fact table loading, best practices?

I am building my first datawarehouse in SQL 2008/SSIS and I am looking for some best practices around loading the fact tables. Currently in my DW I have about 20 Dimensions (Offices, Employees, Products, Customer, etc.) that are of Type 1 SCD. In my…
user1709091
8
votes
7 answers

In a Data Warehouse scenario is there any disadvantage to using WITH(NOLOCK)

I have a Kimball-style DW (facts and dimensions in star models - no late-arriving facts rows or columns, no columns changing in dimensions except expiry as part of Type 2 slowly changing dimensions) with heavy daily processing to insert and update…
Cade Roux
  • 88,164
  • 40
  • 182
  • 265
7
votes
4 answers

Handling nulls in Datawarehouse

I'd like to ask your input on what the best practice is for handling null or empty data values when it pertains to data warehousing and SSIS/SSAS. I have several fact and dimension tables that contain null values in different rows. Specifics: 1)…
rrydman
  • 501
  • 3
  • 9
  • 15
7
votes
2 answers

One or multiple fact tables?

I am trying to build a data mart. I have lot of dimensions, and couple of measures - facts. Every measure is connected to all dimensions in term of business. There is the standard approach that there will be one big fact table with all measures.…
7
votes
4 answers

Why NULL values are mapped as 0 in Fact tables?

What is the reason that in measure fields in fact tables (dimensionally modeled data warehouses) NULL values are usually mapped as 0?
jrara
  • 16,239
  • 33
  • 89
  • 120
7
votes
1 answer

What advantages does in-memory OLAP have over traditional systems with significant memory?

Do in-memory OLAP engines have advantages over the traditional OLAP engines backed by enough RAM to contain the entire cube(s)? For example, if I use a MOLAP engine (SSAS) and GB / TB of RAM where the entire cube (or even star-schema) is RAM…
kermatt
  • 1,585
  • 2
  • 16
  • 36