Types of Data Warehouses :-
- Host Based Datawarehouses
- Host Based (MVS) Data Warehouses
- The data warehouses that reside on high-volume databases on MVS are the host based type of data warehouses.
- usually have very high volumes of data storage
- require support for both MVS and client-based report and query facilities.
- have very complex source systems
- require continuous maintenance since these must be used for mission-critical purposes.
- Unload Phase involves selecting and scrubbing the operational data.
- Transform Phase for translating it into an appropriate form and defining the rules for accessing and storing it.
- Load phase for moving the data directly into DB2 tables or a special file for moving it to another database or non-MVS warehouse.
Such data warehouses - Host Based (Unix) Data Warehouses
- Oracle and Informix RDBMSs provide the facilities for such data warehouses. Both of these databases can extract data from MVS-based databases as well as a larger number of other UNIX-based databases.
- Host Based (MVS) Data Warehouses
- Host Based single-stage (LAN) Datawarehouses
- With a LAN-based warehouse, data delivery can be managed either centrally or from the workgroup environment so that business groups can meet and manage their own information needs without burdening centralized IT resources.
- LAN-based warehousing solutions are normally limited by both DBMS and hardware scalability factors.
- Many LAN based enterprises have not implemented adequate job scheduling, recovery management, organized maintenance, and performance monitoring procedures to support robust warehousing solutions.
- Often these warehouses are dependent on other platforms for source data. Building an environment that has data integrity, recoverability, and security needs careful design, planning and implementation. Otherwise, synchronisation of changes and loads from sources to server could cause innumerable problems.
Limitations/challenges: - LAN Based workgroup Datawarehouses
- In this warehouse, you extract data from a variety of sources (like Oracle, IMS, DB2) and provide multiple LAN-based warehouses.
- Lack of understanding how to distribute data and supporting intentional data redundancy for performance reasons.
- Many organizations may not have adequate job scheduling, recovery management, and performance monitoring to support robust warehousing solutions.
- Although providing +ve cost benefits, LAN-based warehousing solutions can be limited by both hardware and DBMS limitations.
- For many large enterprises, similar skills in database design, maintenance, and recovery are not present in every workgroup environment.
Designed for workgroup environment, it is ideal for any business organization that wishes to build a data warehouse, often called a data mart. Usually requires minimal initial investment and technical training. Its low startup cost and ease of use allow a workgroup to quickly build and easily manage its own custom data mart. - Multistage Datawarehouses
- This configuration is well suited to environments where endusers in different capacities require access to both summarized data for up-to-the-minute tactical decisions as well as summarized, cumulative data for long-term strategic decisions. Both ODS (Operation Data Store) and the data warehouse may reside on host-based on LAN-based databases, depending on volume and usage requirements. Typically the ODS stores only the most recent records. The data warehouse stores the historical evolution of the records.
- Stationary Datawarehouses
- In this type of a data warehouse, user are given direct access to the data, instead of moving from the sources. For many organizations, infrequent access, volume issues or corporate necessities dictate such an approach.
This is likely to impact performace since users will be competing with the production data stores.
Such a warehouse will require sophisticated middleware, possible with a single interface to the user. An integrated metadata repository becomes an absolute necessity under this environment. - Distributed Datawarehouses
- There are at least two types of distributed data warehouses and their variations for the enterprise: local warehouses distributed throughout the enterprises and a global warehouse.
- Activity occurs at local level
- Majority of the operational processing is done at the local site.
- Local site is autonomous
- Each local data warehouse has its own unique structure and content of data.
- The data is unique and of prime importance to that locality only.
- Majority of the data is local and not replicated.
- Any intersection of data between local data warehouses is conincidental.
- Local site serves different geographic regions.
- Local site serves different technical communities.
Useful when there are diverse businesses under the same enterprise umbrella. This approach may be necessary if a local warehouse already existed, prior to joining the enterprise.
Local data warehouses have the following common characteristics: - Virtual Datawarehouse
- The data warehouse is a great idea, but it is complex to build and requires investment. Why not use a cheap and fast approach byeliminating the transformation steps of repositories for metadata and another database. This approach is termed the 'virtual data warehouse'.
- A data dictionary containing the definitions of the various databases.
- A description of the relationship among the data elements.
- The description of the way user will interface with the system.
- The algorithms and business rules that define what to do and how to do it.
- Since queries compete with production data transactions, performance can be degraded.
- There is no metadata, no summary data or no individual DSS (Decision Support System) integration or history. All queries must be repeated, causing addditional burden on the system.
- There is no refreshing process, causing the queries to be very complex.
To accomplish this there is need to define 4 kinds of information:
Data Warehouse Design Approaches
Data warehouse design is one of the key technique in building the data warehouse. Choosing a right data warehouse design can save the project time and cost. Basically there are two data warehouse design approaches are popular.
Bottom-Up Design:
In the bottom-up design approach, the data marts are created first to provide reporting capability. A data mart addresses a single business area such as sales, Finance etc. These data marts are then integrated to build a complete data warehouse. The integration of data marts is implemented using data warehouse bus architecture. In the bus architecture, a dimension is shared between facts in two or more data marts. These dimensions are called conformed dimensions. These conformed dimensions are integrated from data marts and then data warehouse is built.
Advantages of bottom-up design are:
Disadvantages of bottom-up design are:
Top-Down Design:
In the top-down design approach the, data warehouse is built first. The data marts are then created from the data warehouse.
Advantages of top-down design are:
Disadvantages of top-down design are:
Bottom-Up Design:
In the bottom-up design approach, the data marts are created first to provide reporting capability. A data mart addresses a single business area such as sales, Finance etc. These data marts are then integrated to build a complete data warehouse. The integration of data marts is implemented using data warehouse bus architecture. In the bus architecture, a dimension is shared between facts in two or more data marts. These dimensions are called conformed dimensions. These conformed dimensions are integrated from data marts and then data warehouse is built.
Advantages of bottom-up design are:
- This model contains consistent data marts and these data marts can be delivered quickly.
- As the data marts are created first, reports can be generated quickly.
- The data warehouse can be extended easily to accommodate new business units. It is just creating new data marts and then integrating with other data marts.
Disadvantages of bottom-up design are:
- The positions of the data warehouse and the data marts are reversed in the bottom-up approach design.
Top-Down Design:
In the top-down design approach the, data warehouse is built first. The data marts are then created from the data warehouse.
Advantages of top-down design are:
- Provides consistent dimensional views of data across data marts, as all data marts are loaded from the data warehouse.
- This approach is robust against business changes. Creating a new data mart from the data warehouse is very easy.
Disadvantages of top-down design are:
- This methodology is inflexible to changing departmental needs during implementation phase.
- It represents a very large project and the cost of implementing the project is significant.