Data warehousing Design

In our previous posts we have got to learn about Data Warehousing Objects, different kinds of Data Warehouse schemas and Data Warehouse Basics. Now it time we learn about how to build or design a Data Warehouse. 

Designing or Building of a Data Warehouse can be done following either one of the approaches. These approaches are notably known as:

* The Top-Down Approach
* The Bottom-Up Approach

These approaches are defined by the two of the bearers of Data Warehousing namely Ralph Kimball and Bill Inmon.

The Top-Down Approach

This approach was proposed by Bill Inmon, as he stated "Data warehouse is one part of the overall business intelligence ststem. An enterprise has one data warehouse and date marts source their information from the data warehouse. In the data warehouse, information is stored in 3rd normal form". 

In short Bill Inmon advocated a "dependent data mart structure".


The above image shows how does the Top-Down model Works.

Here are the steps:

* The data is extracted from different/same data sources. This data is loaded into the staging areas and validated and consolidated for ensuring the level of accuracy and then pushed to the Enterprise Data Warehouse (EDW).

* Detailed data is regularly extracted from EDW and is temporarily hosted in staging area for aggregation, summarization and then extracted and loaded into Data Warehouse.

* Once the aggregation and summarization of data is completed the Data marts extract the data into data marts and apply fresh transformations on them. This is done so that the data which comes is in sync with the strutures defined for the data mart.

The Bottom-Up Approach



This approached was proposed by Ralp Kimball, stated as " Data Warehouse is the conglomerate of all data marts within the enterprise. Information is always stored in the dimensional model."

* Ralp kimball designed the data warehouse with the data marts connected to it with a bus structure.

* The bus structure as you can see above, contained all the common elements that are used by data marts such as conformed dimension, measures etc. 

Basically, Kimball model reverses the Inmon model i.e. Data marts are directly loaded with the data from the source systems and then ETL process is used to load in to Data Warehouse.

Here are the steps:

* The data flow in the bottom up approach starts from extraction of data from operational databases into the staging area where it is processed and loaded into the EDW.

* The data in EDW is refreshed or replaced by the fresh data being loaded. After EDW is refreshed the current data is once again extracted in staging area and transformations are applied to fit into the data mart structure. The data is the extracted from Data Mart to the staging area aggregated, summarized and so on loaded into EDW and then made available for the end user for analysis.