Java/.NET Developer moving towards Data Warehouse

Question

I understand the concept of a Data Warehouse after reading questions like this: What is a data warehouse?. I am familiar with OLAP and MDX (MDX to a limited extent).

I have a .NET application that connects to about fifteen different databases to search for information and also to manage information i.e. it is a Java application that connects to fifteen databases that are Oracle/SQL based. I believe a Data Warehouse would meet my needs. I have two questions about Data Warehouses:

Do you copy all the data needed to make a decision into the Data Warehouse (using SSIS) or do you just leave it in the OLTP systems and query it or a bit of both?
What do you use as the user interface? a Java app/VB.NET perhaps

score 0 · Answer 1 · answered Aug 07 '13 at 20:36

A data warehouse is usually a reformatted version of a transactional database intended for fast access, providing business insight to end users (technical and non-technical). For example, a datawarehouse in the Kimball model is a denormalized (think tall and skinny) version of the transaction database in a star schema. The data is taken from the OLTP database, goes through an ETL process (Extract, Transform, Load), and then is made available to business users (for end-user UI querying a la Business Objects, for example).

There are many, many variations on this. You would be well served to look at some articles on basic Kimball-style datawarehouses (and I'm not trying to be biased, it's just what I've used in the past). There are early edition used Kimball Warehousing books on Amazon for pennies, if you feel a need for a physical book.

score 0 · Answer 2 · answered Aug 28 '13 at 06:22

At one end of the scale you have a data warehouse which is 'batch loaded' meaning jobs run to load the data. You take advantage of this specialised processing time to model and transform the data into a schema that is easy to analyse. Data warehouses are much more than just copying data around.

At the other end of the scale is what you are doing right now - performing 'federated' queries.

You need to research the advantages and disadvantages of each method and compare them to your business requirements. You also need to consider the future state of systems, i.e. maybe your program works now but what happens if data volumes increase or you suddenly need to access another five systems?

For example, is it OK if your users have data which is one day out of date (Latency)? Will this still be OK in 5 years?

In answer to your questions:

1) Do you copy all the data needed to make a decision into the Data Warehouse?

If your users can put up with latency, and you have enough disk space, and smart enough replication (i.e. CDC), and adhoc querying of the source system is an issue, then yes you would copy all the data over. (into something that is usually called an ODS)

2) What do you use as the user interface?

Since you have already mentioned SSIS (a MS tool), note that MS also comes with SSAS (a cube building tool), SSRS (a web report tool) and Excel (basically your cube client). You can wrap this up in Sharepoint but it's not trivial.

Note that all these functions (ETL, Cubing, User Interface) are provided by many other vendors as well, but in a MS shop you have basically already bought your BI tools so you might want to start with those.

You can use .Net to build yourself a more sophisiticated tool but you are better off getting something off the shelf.

Thanks. Would .NET be involved at all? e.g. to provide a user interface to the Data Warehouse? — w0051977, Aug 28 '13 at 07:44
If you wish you could build an improved user interface using .Net but it isn't necessary. .Net can be used at all stages (ETL, Building Cubes, User Interface) but doesn't have to be. To ascertain whether your .Net should be replaced with a data warehouse you need to work out business reasons for doing so. — Nick.Mc, Aug 29 '13 at 03:14

Java/.NET Developer moving towards Data Warehouse

2 Answers2

Linked