Data Warehouse Design Methodologies
There are two different methodologies normally followed when designing a Data Warehouse solution and based on the requirements of your project you can choose which one suits your particular scenario. These methodologies are a result of research from Bill Inmon and Ralph Kimball.
Bill Inmon - Top-down Data Warehouse Design Approach
Bill Inmon is sometimes also referred to as the "father of data warehousing"; his design methodology is based on a top-down approach and defines data warehouse in these terms
- Subject oriented - The data in a data warehouse is categorized on the basis of the subject area and hence it is "subject oriented".
- Integrated - Data gets integrated from different disparate data sources and hence universal naming conventions, measurements, classifications and so on used in the data warehouse. The data warehouse provides an enterprise consolidated view of data and therefore it is designated as an integrated solution.
- Non-volatile - Once the data is integrated\loaded into the data warehouse it can only be read. Users cannot make changes to the data and this practice makes the data non-volatile.
- Time Variant - Finally data is stored for long periods of time quantified in years and has a date and timestamp and therefore it is described as "time variant".
Bill Inmon saw a need to integrate data from different OLTP systems into a centralized repository (called a data warehouse) with a so called top-down approach. Bill Inmon envisions a data warehouse at center of the "Corporate Information Factory" (CIF), which provides a logical framework for delivering business intelligence (BI), business analytics and business management capabilities.
This top-down design provides a highly consistent dimensional view of data across data marts as all data marts are loaded from the centralized repository (Data Warehouse). The top-down design has also proven to be flexible to support business changes as it looks at the organization as whole, not at each function or business process of the organization. Generating a new dimensional data marts against the data stored in the data warehouse is a relatively simple task. Though there are some challenges for the top-down approach, for example it represents a very large project with a very broad scope and hence the up-front cost for implementing a data warehouse using the top-down methodology is significant. Further, the duration of time from the start of project to the point that end users start experience initial benefits of the solution can be substantial. Also, the top-down methodology can be inflexible and unresponsive to changing departmental or business process needs (a concern for today's dynamically changing environment) during the implementation phase.
Ralph Kimball - Bottom-up Data Warehouse Design Approach
Ralph Kimball is a renowned author on the subject of data warehousing. His design methodology is called dimensional modeling or the Kimball methodology. This methodology focuses on a bottom-up approach, emphasizing the value of the data warehouse to the users as quickly as possible. In his vision, a data warehouse is the copy of the transactional data specifically structured for analytical querying and reporting in order to support the decision support system. As per his methodology, data marts are first created to provide reporting and analytical capabilities for specific business\functional processes and later on these data marts can eventually be unioned together to create a comprehensive enterprise data warehouse. The bottom-up approach focuses on each business process at one point of time so the return on investment could be as quick as first data mart gets created. Though if not carefully planned, you might lack the big picture of the enterprise data warehouse by missing some dimensions or by creating redundant dimensions, etc. when you are too focused on an individual business process.
Ralph Kimball's bottom-up approach proposes to create a business matrix which should contain all the common elements (that are used by data marts such as conformed\shared dimension, measures, etc.) defined for the enterprise as whole. With this, the user can design and develop solutions which supports doing analysis across the business processes for cross selling. You can learn more about the matrix here.
For a person who wants to make a career in Data Warehouse and Business Intelligence domain, I would recommended studying Bill Inmon's books (Building the Data Warehouse and DW 2.0: The Architecture for the Next Generation of Data Warehousing) and Ralph Kimball's book (The Microsoft Data Warehouse Toolkit).