Questions tagged [data-warehouse]

A data warehouse (DW) is a database specifically structured/designed to aid in querying, analyzing and reporting (generating reports out of) of current and historical data. DWs are central repositories of integrated data from one or more disparate sources. Basic difference between a data warehouse and a set of DB tables is how the data is organized/structured.

A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data used for querying, analyzing and reporting for decision-support purposes.

Data Mart is the access layer of the data warehouse. It serves a particular department like Marketing, HR, etc. Dedicated to business function and unit specifications, data marts make the information more focused and faster to find.

Some differences between Data mart and Data warehouse:

  • Data Warehouses have multiple subject areas with more detailed information. They are integrating all sources of data. Dimensional modelling is not necessary, but it feeds dimensional models
  • Data Marts usually hold one subject area with not that detailed information - often summarized. Concentrate on integrating information from one subject area or source system. Built on dimensional models like star-schema.

There are many products readily available that provide data warehousing capability e.g. MSAccess, Essbase (Hyperion, now Oracle), Cognos, Business Objects, MicroStrategy, ...

Basics of Data Warehousing:

  • Dimensional Modelling - Consist of the identify the measurements, or facts, that are given the context by their related dimensions. The grain of the fact table describes the level of detail at which the facts are recorded.

Main steps of relational modelling:

  • Choose the business process
  • Declare the grain
  • Identify the dimensions
  • Identify the fact

Online Analytical Processing (OLAP) and it's types (ROLAP, MOLAP, HOLAP, ...): Describes basics of the DB designs and pros/cons of each way. - A variety of different design patterns are used in a data warehouse environment. Some common approaches include: Normalized (5NF); DataVault; Anchor Modelling; Dimensional (5,6); other temporal (e.g. 6NF). - SQL: Describes how a Data Warehouse can be queried. Following is a list of basic keywords that every data warehouse developer must know: - JOIN - GROUPBY

At a high level the Data Warehousing can be divided into:

  • Tools (IBM Cognos, Microsoft Business Intelligence, Oracle Business Intelligence Enterprise, dition(OBIEE), Business Objects Enterprise XI, Jaspersoft, Talend Open studio, Pentaho, Qlikview etc) readily available and how to use them. Used for small to medium sized data sets. This usually requires [at least] knowledge of tool's:
    • data model and
    • user interface
  • Building your own data warehouse for specific usecases. Used when dealing with really huge data sets (e.g. the data collected by Google, Yahoo, Facebook or a couters/performance-management-data from a large telecommunication network. This usually requires [at least] knowledge of:
    • scalability, high availability and clustering concepts.
    • data warehouse (schema, queries, data model, ...) design.
    • available databases (Oracle, Clustra, Greenplum, MySQL, DB2, ...)
    • problem domain (implicit).
    • relevant GUI/UI (SWING, JSP, ...) and business logic (J2EE, C++, ...) technologies
2778 questions
6
votes
4 answers

data warehouse and database difference in implementation

Can anyone tell me the difference between a simple database and a data warehouse in terms of implementation? I know that data warehouse is used for analysis rather than keeping record but I don't understand how are they structurally different In…
Waheed Khan
  • 1,323
  • 6
  • 18
  • 28
6
votes
1 answer

Database vs DataMart vs Data Warehouse vs Data Lake

Looking for the high-level differences/comparison among Database Data Mart (Top-down approach) Data Warehouse Data Lake Data Lakehouse Please use relative comparison when specifics are not available.
Ashok Goli
  • 5,043
  • 8
  • 38
  • 68
6
votes
1 answer

SSIS : Using a variable as the name of a flat file destination

I have an SSIS solution where depending on a parameter, it launches the extraction of different databases, each in a different file, and the name must contain the date of the extraction so we can't hard code the name in the Destination wizard, I…
Random it guy
  • 97
  • 2
  • 7
6
votes
5 answers

Differences between Data Vault and Dimensional modeling?

When modeling a data warehouse, is there any reason we should favor Data Vault over Dimensional modelling? What are the major differences between these two?
jrara
  • 16,239
  • 33
  • 89
  • 120
6
votes
3 answers

SQL Replace All Tables with Clustered Columnstore Index

We are conducting a migration project, and looking to replace most Rowstore indexes with Clustered Columnstore indexes for large Data Warehouse. We are adding a unique index on the identity column. Does anyone have script to alter run through all…
user11156893
6
votes
2 answers

What are the pros and cons of loading data directly into Google BigQuery vs going through Cloud Storage first?

Also, is there anything wrong with doing transforms/joins directly within BigQuery? I'd like to minimize the number of components and steps involved for a data warehouse I'm setting up (simple transaction and inventory data for a chain of retail…
6
votes
3 answers

Storing and reloading large multidimensional data sets in Python

I'm going to be running a large number of simulations producing a large amount of data that needs to be stored and accessed again later. Output data from my simulation program is written to text files (one per simulation). I plan on writing a Python…
dbb
  • 61
  • 1
  • 2
6
votes
2 answers

What would be the difference between WITH clause & temporary table?

In layman terms, what would be the key differences between WITH clause & temporary table? In which scenario it is better to use one over the other?
6
votes
1 answer

Dimension for geozones or Lat & Long in data warehouse

I have a DimPlace dimension that has the name of the place (manually entered by the user) and the latitude and longitude of the place (automatically captured). Since the Places are entered manually the same place could be in there multiple time…
tember
  • 1,418
  • 13
  • 32
6
votes
5 answers

Why is datekey in fact tables always INT?

I'm looking at the datekey column from the fact tables in AdventureWorksDW and they're all of type int. Is there a reason for this and not of type date? I understand that creating a clustered index composed of an INT would optimize query speed. But…
fdkgfosfskjdlsjdlkfsf
  • 3,165
  • 2
  • 43
  • 110
6
votes
4 answers

How to avoid joins between fact tables in a star schema?

I'm trying to model my data warehouse using a star schema but I have a problem to avoid joins between fact tables. To give a trivial idea of my problem, I want to collect all the events who occur on my operating system. So, I can create a fact table…
Pierre
  • 1,942
  • 3
  • 23
  • 43
6
votes
1 answer

Export Google Analytics data (log of events)

I have some applications and a website that trigger Google Analytics events. I would like to export google analytics events to my datawarehouse in order to have better insights and analytics. For this question, you can use the language you want…
6
votes
1 answer

How to design a shrunken dimension table for dates in dimensional warehouse and use in SSAS?

I'm working in a situation where we're moving from having a bunch of transactional fact tables to a more complicated picture with aggregates, snapshots, etc. In the past, there were a few cases where data needed to be aggregated by month, but…
6
votes
2 answers

Why do we need Data Warehouse?

I was wondering about why and when we need a data warehouse, I mean the main goal of data warehouse is to provide a reporting from multidimentional view, but in some case there is a way to build a report using DBMS, it can produce a report from…
user2978983
  • 254
  • 5
  • 14
6
votes
1 answer

Downloadable example of a normalize Inmon Enterprise Data Warehouse

I am familiar with Kimball data warehouses, but I have never seen an Inmon data warehouse. With all my research, I'm looking for a real example. I understand that Inmon data warehouses are normalized and work with a top down approach. Is there a…