Questions tagged [data-warehouse]

A data warehouse (DW) is a database specifically structured/designed to aid in querying, analyzing and reporting (generating reports out of) of current and historical data. DWs are central repositories of integrated data from one or more disparate sources. Basic difference between a data warehouse and a set of DB tables is how the data is organized/structured.

A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data used for querying, analyzing and reporting for decision-support purposes.

Data Mart is the access layer of the data warehouse. It serves a particular department like Marketing, HR, etc. Dedicated to business function and unit specifications, data marts make the information more focused and faster to find.

Some differences between Data mart and Data warehouse:

  • Data Warehouses have multiple subject areas with more detailed information. They are integrating all sources of data. Dimensional modelling is not necessary, but it feeds dimensional models
  • Data Marts usually hold one subject area with not that detailed information - often summarized. Concentrate on integrating information from one subject area or source system. Built on dimensional models like star-schema.

There are many products readily available that provide data warehousing capability e.g. MSAccess, Essbase (Hyperion, now Oracle), Cognos, Business Objects, MicroStrategy, ...

Basics of Data Warehousing:

  • Dimensional Modelling - Consist of the identify the measurements, or facts, that are given the context by their related dimensions. The grain of the fact table describes the level of detail at which the facts are recorded.

Main steps of relational modelling:

  • Choose the business process
  • Declare the grain
  • Identify the dimensions
  • Identify the fact

Online Analytical Processing (OLAP) and it's types (ROLAP, MOLAP, HOLAP, ...): Describes basics of the DB designs and pros/cons of each way. - A variety of different design patterns are used in a data warehouse environment. Some common approaches include: Normalized (5NF); DataVault; Anchor Modelling; Dimensional (5,6); other temporal (e.g. 6NF). - SQL: Describes how a Data Warehouse can be queried. Following is a list of basic keywords that every data warehouse developer must know: - JOIN - GROUPBY

At a high level the Data Warehousing can be divided into:

  • Tools (IBM Cognos, Microsoft Business Intelligence, Oracle Business Intelligence Enterprise, dition(OBIEE), Business Objects Enterprise XI, Jaspersoft, Talend Open studio, Pentaho, Qlikview etc) readily available and how to use them. Used for small to medium sized data sets. This usually requires [at least] knowledge of tool's:
    • data model and
    • user interface
  • Building your own data warehouse for specific usecases. Used when dealing with really huge data sets (e.g. the data collected by Google, Yahoo, Facebook or a couters/performance-management-data from a large telecommunication network. This usually requires [at least] knowledge of:
    • scalability, high availability and clustering concepts.
    • data warehouse (schema, queries, data model, ...) design.
    • available databases (Oracle, Clustra, Greenplum, MySQL, DB2, ...)
    • problem domain (implicit).
    • relevant GUI/UI (SWING, JSP, ...) and business logic (J2EE, C++, ...) technologies
2778 questions
7
votes
2 answers

Data Warehouse modelling: Data Vault vs Persistent Staging Area

Consider the following two DWH architectures: DWH with Raw Data Vault, layers: Source systems Staging area (truncated on every load, exact schema of source tables) Raw Data Vault (modelled as Data Vault, contains record history, hubs/sats/links…
user3596100
  • 147
  • 9
7
votes
1 answer

SSAS Internal error: An Unexpected error occurred (file 'pfcre.cpp' function PFCREngine::SelectCartridge)

Has anyone seen this error before? If so, how did you fix it? I can't find anything on Google. Here is what I have done: I tried doing a Google search but practically nothing came up. I checked all of my permissions everywhere, and from what I can…
7
votes
2 answers

Redshift Performance of Flat Tables Vs Dimension and Facts

I am trying to create dimensional model on a flat OLTP tables (not in 3NF). There are people who are thinking dimensional model table is not required because most of the data for the report present single table. But that table contains more than…
prakash
  • 165
  • 1
  • 2
  • 10
7
votes
3 answers

EDW Kimball vs Inmon

I've been tasked with coming up with a recommendation of how to proceed with a EDW and am looking for clarification on what I'm seeing. Everything that I am learning about states that Kimball's approach will bring value quicker to business vs…
user3776554
  • 119
  • 4
7
votes
1 answer

Azure SQL Data Warehouse and Entity Framework

Does anybody know if/when we will be able to use Entity Framework with the Azure SQL Data Warehouse? I have searched high and low and cannot seem to find anything other than a question posted on MSDN in Auguts 2015 saying: "We have not yet…
How 'bout a Fresca
  • 2,267
  • 1
  • 15
  • 26
7
votes
10 answers

How can I speed up queries against huge data warehouse tables with effective-dated data?

So I am querying some extremely large tables. The reason they are so large is because PeopleSoft inserts new records every time a change is made to some data, rather than updating existing records. In effect, its transactional tables are also a…
aw crud
  • 8,791
  • 19
  • 71
  • 115
7
votes
2 answers

Impute via fill-forward/LOCF a column over a range of sequential rows in SQL?

When formatting data for time-series analysis, a common need is to impute missing values by filling-forward values over time (also called Last-Observation-Carried-Forward / LOCF ). While data-analysis environments typically provide that…
DavidJ
  • 4,369
  • 4
  • 26
  • 42
7
votes
6 answers

Avoid writing SQL queries altogether in SSIS

Working on a Data Warehouse project, the guy that gave us the tutorial advised that we stick to using SQL queries over defining a lot of data flow transformations, citing points like it'll consume a lot of memory on the ETL box so we'd rather leave…
Jonn
  • 4,599
  • 9
  • 48
  • 68
7
votes
7 answers

In a star schema, are foreign key constraints between facts and dimensions neccessary?

I'm getting my first exposure to data warehousing, and I’m wondering is it necessary to have foreign key constraints between facts and dimensions. Are there any major downsides for not having them? I’m currently working with a relational star…
Garett
  • 16,632
  • 5
  • 55
  • 63
7
votes
2 answers

Insert into a star-schema

I've read a lot about star-schema's, about fact/deminsion tables, select statements to quickly report data, however the matter of data entry into a star-schema seems aloof to me. How does one "theoretically" enter data into a star-schema db? while…
shaun
  • 73
  • 1
  • 3
7
votes
4 answers

Handling multiple fact tables in Qlikview

I have a PostgreSQL database containing various education data such school-level test scores and enrollment figures. I need to separate enrollment from test scores because the data is on different grains. Even though enrollment is on a different…
7
votes
3 answers

MDX date range query with a missing boundry date

I need an MDX query for Mondrian filtered by date, where one or both of the boundry dates may not exist. I'm using the query below that works as long as both 2013-01-01 and 2013-01-08 dimensions exist. If one of the two dates does not exist then…
Dynamite
  • 93
  • 4
6
votes
3 answers

Data Aggregation - Daily SQL Script vs Data Warehouse

Pardon me if this has already been asked (I know very little about Data Warehouse/BI and have yet to master the keywords). I have a table that grow by more then 100 000 rows per day, each row having a timestamp and multiple information about an item…
Benoittr
  • 4,091
  • 4
  • 27
  • 38
6
votes
1 answer

Database Design Patterns/Considerations when leveraging Master Data Services

I'm trying to find a good source of information on the database and/or table design considerations/patterns to be aware of when designing a data warehouse and leveraging MDS/DQS components...things like adding "current flag", "created date", "end…
6
votes
6 answers

Datawarehouse - How to Link dimensions

Just got into datawarehousing and need your help to clarify a confusion. Lets say I have Employee dimension and Department Dimension. If I have a report that requires me to list fields from dimEmployee (Name, Salary, Position) and fields from…
Raj