Data Warehousing design Using Oracle

Creating an Oracle Data Warehouse

Oracle Warehouse Builder is a flexible tool that enables you to design and deploy various types of data management strategies, including traditional data warehouses.This chapter provides a brief introduction to the basic, minimal steps for creating an Oracle data warehouse. It provides a starting point for using Warehouse Builder for the first time and serves as a road map to the documentation.
This chapter includes the following topics:
  • Understanding the Basic Concepts
  • General Steps for Creating an Oracle Data Warehouse
  • About Locations
  • About Modules

Understanding the Basic Concepts

Oracle Warehouse Builder is comprised of a set of graphical user interfaces to assist you in implementing complex data system designs. Your designs are saved as metadata in a centralized repository.
The centralized repository, known as the Warehouse Builder repository, is hosted on an Oracle Database. The Design Center is the interface that provides a visual representation of the Warehouse Builder repository. Use the Design Center to import source objects such as tables and views, design ETL processes such as mappings, and ultimately design the target warehouse.
mapping is an object in which you define the flow of data from sources to targets. Based on a mapping design, Warehouse Builder generates the code required to implement the ETL logic.Warehouse Builder can generate PL/SQL, SQL*Loader, or ABAP code for mappings.
After you complete the design of a mapping, for example, and prompt Warehouse Builder to generate the code, the next step is to deploy the mapping. Deployment is the process of copying the relevant metadata and code you generated in the Design Center to a target schema. The target schema is generically defined as the database which will execute the ETL logic you designed in the Design Center. Specifically, in a traditional data warehousing implementation, the data warehouse is the target schema and the two terms are interchangeable.
Figure 2-1 illustrates the deployment process. For the purposes of this illustration, the target schema and the repository exist on the same Oracle Database; however, in practice, target schemas often exist on separate databases. To deploy design objects and subsequently execute the generated code, use the Control Center Manager, which is the client interface that interacts with the target schema through the control center service.
As previously noted, the Design Center is the primary user interface. It is also a centralized interface in that you can launch from it all the client based tools, including the Control Center Manager. A secondary user interface is the web-based Repository Browser. In addition to browsing design metadata and auditing execution data, you can view and create reports. You can also perform limited lineage impact analysis in the Repository Browser. However, the Design Center has more advanced, interactive capabilities for lineage and impact analysis.
Figure 2-1 Warehouse Builder Components
Description of Figure 2-1 follows
Description of "Figure 2-1 Warehouse Builder Components"

General Steps for Creating an Oracle Data Warehouse

Use Warehouse Builder to create a data warehouse in the following recommended stages:
  • Before You Begin
  • Stage 1: Preparing the Warehouse Builder Design Center
  • Stage 2: Importing the Source Metadata
  • Stage 3: Designing the Oracle Data Warehouse
  • Stage 4: Deploying the Design and Implementing the Data Warehouse
Before You Begin
Before you can use any of the Warehouse Builder client components, first ensure access to a Warehouse Builder repository, which is an Oracle Database schema that stores the metadata for the system you design.
To begin using Warehouse Builder, take the following steps:
  1. Install the Warehouse Builder software and create a repository as described in the Oracle Warehouse Builder Installation and Administration Guide.
    If an administrator has previously completed the server and client installation, contact that person for the connection information required to log on to the repository.
  2. Launch the Design Center, shown in Figure 2-2.
    On a Windows platform, from the Start menu, select Programs. Select the Oracle home in which Warehouse Builder is installed, then Warehouse Builder, and then Design Center.
    On a Linux platform, launch owbclient.sh located in the owb/bin/unix directory in the Oracle home for Warehouse Builder.
Figure 2-2 shows the Design Center with the top level folders in each of its 3 explorers expanded.
Figure 2-2 The Design Center
Screen capture of the Design Center
Description of "Figure 2-2 The Design Center"

Stage 1: Preparing the Warehouse Builder Design Center
To integrate data and design a data warehouse, you primarily utilize the Project Explorer and the Connection Explorer shown in Figure 2-2.
In the Project Explorer, Warehouse Builder creates a single default project, MY_PROJECT. As a project, it contains nodes for each type of design object that you can create or import.
The Connection Explorer is the window you use to establish connections between the Warehouse Builder repository to databases, data files, and applications.
To prepare the Design Center, complete the following steps:
  1. Adjust the client preference settings as desired or accept the default preference settings and proceed to the next step.
    From the main menu in the Design Center, select Tools and then Preferences.
    As a new user, you may be interested in setting the Environment Preferences, the locale under Appearance Preferences, and the naming mode under Naming Preferences. For information on all the preferences, see Setting Preferences.
  2. Connect to the source and target data objects.
    You establish these connections by defining locations in the Connection Explorer. Expand the Location node and the nodes within it to gain a general understanding of the types of source and targets you can access from Warehouse Builder.
    To create a location, right-click the appropriate node and select New. Fill in the requested connection information and select Test Connection. In this step, you merely establish connections to sources and targets. You do not move data or metadata until subsequent steps.
    For more information about locations, see "About Locations".
  3. Organize the design environment.
    If you are satisfied with the default project, MY_PROJECT, continue with the next step.
    Alternatively, you can define more projects at any time. Each project you define is organized in the same fashion as shown in MY_PROJECT with nodes for databases, files, applications, and so on. For a different organization, consider creating optional collections as described in "Organizing Design Objects into Projects and Collections".
  4. Identify the Oracle Database target.
    Although you can use a flat file as a target, the most common and recommended scenario is to use an Oracle schema as the data warehouse target as described in these steps.
    To define the Oracle target, begin by creating a module. Modules are grouping mechanisms in the Project Explorer that correspond to locations in the Connection Explorer. The Oracle target module is the first of several modules you create in Warehouse Builder.
    In the Project Explorer, expand the Databases node. Right-click Oracle and select New. The Create Module wizard displays. Set the module type to target and specify whether the module will be used in development, quality assurance, or production. This module status is purely descriptive and has no bearing on subsequent steps you take.
    When you complete the wizard, the target module displays with nodes for mappings, transformations, tables, cubes and the various other types of objects you utilize to design the target warehouse.
  5. Create a separate Oracle module for the data sources. (Optional)
    At your discretion, you can either create another Oracle module to contain Oracle source data or simply proceed to the next step.
  6. Identify the execution environment.
    Under the Connection Explorer, notice the Control Centers node. A control center is an Oracle Database schema that manages the execution of the ETL jobs you design in the Design Center in subsequent steps.
    During installation, Warehouse Builder creates the DEFAULT_CONTROL_CENTER schema on the same database as the repository.
    If you choose to utilize the default execution environment, continue to the next step. Alternatively, you can define new control centers at any time. For more information and instructions, see "Deploying to Target Schemas and Executing ETL Logic" .
  7. Prepare development, test, and production environments. (Optional)
    Thus far, these instructions describe the creation of a single project corresponding to a single execution environment. You can, however, reuse the logical design of this project in different physical environments such as testing or production environments.
    Deploy a single data system to several different host systems or to various environments, by "Creating Additional Configurations".
Stage 2: Importing the Source Metadata
  1. Import metadata from the various data sources.
    In the Project Explorer, select a node and determine the locations from which you intend to ultimately extract data. Now create a module for each relevant location. After you create a module, right-click the module and select Import to extract metadata from the associated location. Warehouse Builder launches a wizard to guide you through the process of importing data.
    For an example and additional information on importing data objects, see "Identifying Data Sources and Importing Metadata".
  2. For the metadata you imported, profile its corresponding data. (Optional)
    Before continuing to the next step, consider using the data profiling option to ensure data quality as described in "Understanding Data Quality Management".
Stage 3: Designing the Oracle Data Warehouse
  1. Create and design the data objects for the Oracle target module.
    In previous steps, you may have already imported existing target objects. For new target objects, design any of the dimensional or relational objects listed in Table 4-1.
    To create data objects, you can either launch the appropriate wizard or use the Data Object Editor. To use a wizard, right-click the node for the desired object and select New. After using a wizard, you may want to modify the object in the editor. In that case, right-click the object and select Open Editor.
    For additional information, see Chapter 4, "Designing Target Schemas".
  2. As you design your data warehouse, be sure to frequently validate the design objects.
    You can validate objects as you create them, or validate a group of objects together. In the Project Explorer, select one or more objects or modules, then click the Validate icon.
    Examine the messages in the Validation Results window. Correct any errors and try validating again.
    To redisplay the most recent validation results at a later time, choose Validation Messages from the View menu.
  3. When satisfied with the design of the target objects, generate the code.
    Generation produces a DDL or PL/SQL script to be used in subsequent steps to create the data objects in the target schema.
    In the Data Editor, you can generate code for a single object by clicking the Generate icon.
    In the Project Explorer, select one or more objects or modules, then click the Generate icon. Examine the messages in the Generation Results window. To redisplay the most recent generation results at a later time, choose Generated Scripts from the View menu.
    You can save the generated script as a file and optionally deploy it outside Warehouse Builder.
  4. Design mappings that define the flow of data from a source to target objects.
    In the Project Explorer, expand the Oracle target module, right-click the Mappings node and select New.
    The Mapping Editor enables you to define the flow of data visually. You can drag-and-drop operators onto the canvas, and draw lines that connect the operators.
    Follow the Instructions for Defining Mappings, concluding with generating the code for the mapping.
  5. To manage dependencies between mappings, refer to "Designing Process Flows".
Stage 4: Deploying the Design and Implementing the Data Warehouse
Recall that deployment is the process of copying the relevant metadata and code you generated in the Design Center to a target schema. This step is necessary to enable the target schema to execute ETL logic such as mappings.
To deploy and execute the data warehouse design, complete the following steps:
  1. Deploy objects from either the Design Center or Control Center Manager.
    In this step, you define the objects in the target schema. You need do this only once.
    The simplest approach is to deploy directly from the Design Center by selecting an object and clicking the Deploy icon. In this case, Warehouse Builder deploys the objects with the default deployment settings.
    Alternatively, if you want more control and feedback on how Warehouse Builder deploys objects, select Tools from the Design Center menu and select Control Center Manager.
    Whether you deploy objects from the Design Center or the Control Center Manager, be sure to deploy all associated objects. For example, when deploying a mapping, also deploy the target data objects such as tables that you defined and any associated process flows or other mappings.
  2. Execute the ETL logic to populate the target warehouse.
    In this step, you move data for the first time. Repeat this step each time you want to refresh the target with new data.
    You have two options for executing the ETL logic in mappings and process flows. You can create and deploy a schedule as described in "Process for Defining and Using Schedules". Or you can execute jobs manually as described in "Starting ETL Jobs".

About Locations

Locations enable you to store the connection information to the various files, databases, and applications that Warehouse Builder accesses for extracting and loading data. Similarly, locations also store connection information to ETL management tools and Business Intelligence tools. For a detailed listing, see "Supported Sources and Targets".
Oracle Database locations and file locations can be sources, targets, or both. For example, you can use a location as a target for storing temporary or staging tables. Later, you can reuse that location as a source to populate the final target schema.
In some cases, such as with flat file data, the data and metadata for a given source are stored separately. In such a case, create a location for the data and another for the metadata.
About Locations, Passwords, and Security
Considering that all Warehouse Builder users can view connection information in a location, note that the passwords are always encrypted. Furthermore, Warehouse Builder administrators can determine whether or not to allow locations to be shared across users and persisted across design sessions. By default, locations are not shared or persisted.
To implement either of these scenarios, see the topic "Managing Passwords in Warehouse Builder" in the Oracle Warehouse Builder Installation and Administration Guide.
Automatically Created Locations
Warehouse Builder automatically creates an Oracle location for each user you register in the repository. For example, if you register user SCOTT, then the repository includes an Oracle location named SCOTT_LOCATION that stores the connection details for to the SCOTT schema.
During installation, Warehouse Builder creates an Oracle location named OWB_REPOSITORY_LOCATION. This location provides the connection details to the Warehouse Builder repository. You cannot rename or delete the repository location. Only a database administrator can change the password. To prevent unauthorized access to the database administrator password, all users are restricted from deploying to the repository location.
Deleting Locations
To delete a location, right-click the location in the Connection Explorer and select Delete. If the delete option is not available here, this indicates that the location has been registered in a control center and is likely being utilized. Verify that the location is not in use, unregister the location in the control center, and then you can delete the location from the Connection Explorer.

About Modules

Modules are grouping mechanisms in the Project Explorer that correspond to locations in the Connection Explorer. A single location can correspond to one or more modules. However, a given module can correspond to only a single location at a time.
The association of a module to a location enables you to perform certain actions more easily in Warehouse Builder. For example, you can re-import metadata by reusing an existing module. Furthermore, when you deploy ETL processes in subsequent steps, modules enable you to deploy related objects together such as process flows.
To create a module:
  1. Expand the Project Explorer until you find the node for the appropriate metadata type.
    For example, if the source data is stored in an Oracle Database, then expand the Databases node to view the Oracle node. Or if the source data is in an SAP R/3 system, expand the applications node to view the SAP node.
  2. Right-click the desired node and choose New.
    The Create Module wizard opens.The wizard determines the correct integrator to use to enable access to the data store you selected.
  3. Follow the prompts in the wizard.
    The wizard prompts you to name and describe the module and associate it with a location.
During the course of using Warehouse Builder, you may need to associate a module with a new location. For example, assuming your production environment utilizes different locations than your development environment, you need to reassociate the modules.
To change the location associated with a module:
  1. In the Project Explorer, select the module.
  2. Click the Configure icon.
    The Configuration Properties dialog box appears.
  3. In the Identification folder, select a new value for the Locations property.