Questions tagged [data-warehouse]

A data warehouse (DW) is a database specifically structured/designed to aid in querying, analyzing and reporting (generating reports out of) of current and historical data. DWs are central repositories of integrated data from one or more disparate sources. Basic difference between a data warehouse and a set of DB tables is how the data is organized/structured.

A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data used for querying, analyzing and reporting for decision-support purposes.

Data Mart is the access layer of the data warehouse. It serves a particular department like Marketing, HR, etc. Dedicated to business function and unit specifications, data marts make the information more focused and faster to find.

Some differences between Data mart and Data warehouse:

Data Warehouses have multiple subject areas with more detailed information. They are integrating all sources of data. Dimensional modelling is not necessary, but it feeds dimensional models
Data Marts usually hold one subject area with not that detailed information - often summarized. Concentrate on integrating information from one subject area or source system. Built on dimensional models like star-schema.

There are many products readily available that provide data warehousing capability e.g. MSAccess, Essbase (Hyperion, now Oracle), Cognos, Business Objects, MicroStrategy, ...

Basics of Data Warehousing:

Dimensional Modelling - Consist of the identify the measurements, or facts, that are given the context by their related dimensions. The grain of the fact table describes the level of detail at which the facts are recorded.

Main steps of relational modelling:

Choose the business process
Declare the grain
Identify the dimensions
Identify the fact

Online Analytical Processing (OLAP) and it's types (ROLAP, MOLAP, HOLAP, ...): Describes basics of the DB designs and pros/cons of each way. - A variety of different design patterns are used in a data warehouse environment. Some common approaches include: Normalized (5NF); DataVault; Anchor Modelling; Dimensional (5,6); other temporal (e.g. 6NF). - SQL: Describes how a Data Warehouse can be queried. Following is a list of basic keywords that every data warehouse developer must know: - JOIN - GROUPBY

At a high level the Data Warehousing can be divided into:

Tools (IBM Cognos, Microsoft Business Intelligence, Oracle Business Intelligence Enterprise, dition(OBIEE), Business Objects Enterprise XI, Jaspersoft, Talend Open studio, Pentaho, Qlikview etc) readily available and how to use them. Used for small to medium sized data sets. This usually requires [at least] knowledge of tool's:
- data model and
- user interface
Building your own data warehouse for specific usecases. Used when dealing with really huge data sets (e.g. the data collected by Google, Yahoo, Facebook or a couters/performance-management-data from a large telecommunication network. This usually requires [at least] knowledge of:
- scalability, high availability and clustering concepts.
- data warehouse (schema, queries, data model, ...) design.
- available databases (Oracle, Clustra, Greenplum, MySQL, DB2, ...)
- problem domain (implicit).
- relevant GUI/UI (SWING, JSP, ...) and business logic (J2EE, C++, ...) technologies

2778 questions

votes

8 answers

Is it good practice to have foreign keys in a datawarehouse (relationships)?

I think the question is clear enough. Some of the columns in my datawarehouse table could have a relationship to a primary key. But is it good practice? It is denormalized, so it should never be deleted again (data in datawarehouse). Hope question…

database data-warehouse

asked Apr 22 '10 at 12:44

Lieven Cardoen

25,140
52
153
244

votes

7 answers

Structure within staging area of data warehouse

We are working on a datawarehouse for a bank and have pretty much followed the standard Kimball model of staging tables, a star schema and an ETL to pull the data through the process. Kimball talks about using the staging area for import, cleaning,…

sql database-design data-warehouse

asked May 14 '09 at 13:43

NeedHack

2,943
3
30
44

votes

2 answers

Using a DATE field as primary key of a date dimension with MySQL

I want to handle a date dimension in a MySQL datawarehouse. (I m a newbie in the DW world) I made some searches with google and saw a lot of table structures (most of) date dimension where the Primary Key is a simple UNSIGNED INTEGER. Why don't use…

mysql date schema data-warehouse dimension

asked Dec 07 '11 at 13:01

nemenems

1,064
2
9
27

votes

3 answers

Calendar tables in PostgreSQL 9

I am building an analytics database (I have a firm understanding of the data and the business objectives and only basic-to-moderate database skills). I have come across some references to building similar warehouses which implement the concept of…

database-design postgresql data-warehouse rails-postgresql

asked Apr 28 '11 at 16:28

NJ.

2,155
6
26
35

votes

3 answers

ETL using Python

I am working on a data warehouse and looking for an ETL solution that uses Python. I have played with SnapLogic as an ETL, but I was wondering if there were any other solutions out there. This data warehouse is just getting started. Ihave not…

python data-warehouse etl

asked Sep 21 '10 at 16:04

emilam

votes

1 answer

Is there a concept of slowly changing FACT in data warehouse

In data warehousing, we have the concept of slowly changing dimensions. I am just wondering why there is no jargon for 'slowly/rapidly changing FACTs' because the same Type1, Type 2 measures can be used to track changes in the FACT table.

data-warehouse dimension fact

asked Feb 06 '14 at 15:16

Victor

16,609
71
229
409

votes

4 answers

Why primary key is (not) required on fact table in dimensional modelling?

I have heard a few references that pk is not required on fact table. I believe every single table should have a pk. How could a person understand a row in a fact table if there is no pk and 10+ foreign keys.

database-design primary-key data-warehouse fact-table

asked Jan 22 '14 at 16:30

BI Dude

1,842
5
37
67

votes

7 answers

Are there any data warehouse frameworks?

I've got a lot of mysql data that I need to generate reports from. It's mostly historic data so it won't be changing much, but it weighs in at 20-30 gigabytes easily and is expected to grow. I currently have a collection of php scripts that will do…

frameworks reporting data-warehouse anchor-modeling

asked Oct 01 '08 at 17:34

reconbot

5,138
6
45
63

votes

2 answers

Database - fact table and dimension table

When reading a book for business objects, I came across the term- fact table and dimension table. Is this the standard thing for all the database that they all have fact table and dimension table or is it just for business object design? I am…

database data-warehouse business-intelligence

asked Feb 20 '12 at 14:37

Jack_of_All_Trades

10,942
18
58
88

votes

6 answers

Advantages of databases like Greenplum or Vertica compared to MongoDB or Cassandra

I am currently working in a few projects with MongoDB and Apache Cassandra respectively. I am also using Solr a lot and I am handling "lots" of data with them (approx. 1-2TB). I've heard of Greenplum and Vertica the first time in the last week and I…

mongodb cassandra data-warehouse vertica greenplum

asked Jan 24 '12 at 13:36

disco crazy

31,313
12
80
83

votes

2 answers

What does "incremental load" mean?

I regularly see the expression 'incremental loading' when reading articles What does is really (technically) mean? What does it implies ? Explanations using use-cases are welcome.

database terminology data-warehouse

asked Dec 17 '10 at 14:03

Spredzy

4,982
13
53
69

votes

5 answers

How to test (unit test) on ETL process?

I know several small companies do not do testing on ETL process, but that seems to be suboptimal from the perspective of software engineering. How do people usually do testing/unit test/functional test on ETL process?

testing etl data-warehouse

asked Jun 14 '16 at 10:15

Hello lad

17,344
46
127
200

votes

4 answers

Which is better, ETL or ELT?

Having spent some time working on data warehousing, I have created both ETL (extract transform load) and ELT (extract load transform) processes. It seems that ELT is a newer approach to populating data warehouses that can more easily take advantage…

data-warehouse etl

asked Jun 19 '10 at 18:53

Chris J

9,164
7
40
39

votes

6 answers

concurrent statistics gathering on Oracle 11g partiitioned table

I am developing a DWH on Oracle 11g. We have some big tables (250+ million rows), partitioned by value. Each partition is a assigned to a different feeding source, and every partition is independent from others, so they can be loaded and processed…

oracle oracle11g etl data-warehouse table-statistics

asked Mar 04 '15 at 15:54

Francesco

votes

4 answers

PostgreSQL to Data-Warehouse: Best approach for near-real-time ETL / extraction of data

Background: I have a PostgreSQL (v8.3) database that is heavily optimized for OLTP. I need to extract data from it on a semi real-time basis (some-one is bound to ask what semi real-time means and the answer is as frequently as I reasonably can but…

postgresql data-warehouse etl near-real-time data-extraction

asked Mar 25 '10 at 22:45

belvoir

Prev 1 2

…

99 100 Next