Use Rational Data Architect to integrate data sources

转载于 2008-07-11 09:42:00 发布 · 1.3k 阅读

收录于

Rational Data Architech

本文介绍使用 IBM Rational Data Architect 实现数据源联邦设计的五步骤流程，涵盖现有基础设施注释、数据源映射、创建联邦模型、映射联邦数据源及生成联邦代码。

No doubt about it -- information integration is challenging. Many business decisions must be documented and many transformations must be performed. IBM Rational® Data Architect can document your decisions and automate part of this process. Read this article to explore a tool-supported process for federation design in just five steps.

Show developerWorks content related to my search: Rational Data Architect

Hide developerWorks content related to my search: Rational Data Architect

Show descriptions | Hide descriptions

1 - 10 of 351 search results |

View all search results

1)	Migrate ERwin data models to Rational Data Architect
	IBM Rational Data Architect delivers unique features not available in many of today's data modeling tools in the market, including CA ERwin. Migrate existing data models created in CA ERwin Data Modeler to Rational Data Architect, and explore some of Rational Data Architect's features data modelers and data architects use on a daily basis.

2)	Rational Data Architect skills series, Part 3: Discover schema relationships with Rational Data Architect
	Use Rational Data Architect to define data mappings. When working with large schemas, it can be very cumbersome to manually create mappings. Rational Data Architect offers a discovery component to semi-automatically identify potential mappings. This tutorial provides an introduction to the relationship discovery component of Rational Data Architect.

3)	Rational Data Architect skills series, Part 2: Generate SQL/XML queries with Rational Data Architect
	You can use the SQL/XML query language to transform relational data into XML format. Since it is cumbersome to manually write SQL/XML queries, IBM Rational Data Architect simplifies this work by automatically generating SQL/XML queries based on graphically defined mappings. Get an introduction to the SQL/XML generation component of Rational Data Architect.

4)	Integrate WebSphere Business Modeler and Rational Data Architect
	Get an overview of IBM Rational Data Architect and IBM WebSphere Business Modeler. Step through three scenarios for integrating business process and data modeling using Rational Data Architect and WebSphere Business Modeler, and find recommendations and best practices along the way.

5)	Use IBM Rational Data Architect to model your Oracle databases
	Find out why IBM Rational Data Architect is gaining momentum as a database modeling tool that is optimized for IBM databases, yet also provides functionality to model other major database vendors such as Oracle. Look at alternatives to Oracle Designer, and see how keeping up with Oracle's major enhancements is setting RDA apart from the rest of the data modeling tools.

6)	Use Rational Data Architect to integrate data sources
	No doubt about it -- information integration is challenging. Many business decisions must be documented and many transformations must be performed. IBM Rational Data Architect can document your decisions and automate part of this process. Read this article to explore a tool-supported process for federation design in just five steps.

7)	Using Rational Data Architect to import and export XML
	XML files are used throughout software development and information management to describe data structures. In this article, you learn how to turn XML files into data models and vice versa, with the help of IBM Rational Data Architect.

8)	DB2 Change Management Expert, Part 1: Integrating DB2 Change Management Expert and Rational Data Architect
	This article highlights the integration between DB2 Change Management Expert, IBM's newest tool for database change management, and Rational Data Architect. You'll learn about the common look and feel, the strengths of each product, and the benefits of combining them together. It includes a step-by-step tutorial and sample that illustrates the touch points between these two tools.

9)	Import COBOL copybooks in IBM Rational Data Architect
	COBOL copybooks are often considered to be 'legacy data.' Many companies rely on the information in those structures, so they need to be included in our data modeling process. In this article learn how to import COBOL copybooks in IBM Rational Data Architect and include them in the data model.

10)	The power of Rational Data Architect
	Learn about the salient features of Rational Data Architect (RDA) and its place and use within the software development lifecycle using the Rational Software Development Platform. In today's world of Service-Oriented Architecture (SOA), data is a vital component. This article explains how RDA addresses the issue of data's importance in SOA, and how RDA is used in a business environment.

rel="stylesheet" type="text/css" href="http://www.ibm.com/developerworks/library/ar-rdaint/krugleStyles.css" />

Show developerWorks source code related to my search: Rational Data Architect

Hide developerWorks source code related to my search: Rational Data Architect

Show details | Hide details

1 - 10 of 67 source code results |

Show all search results (hosted by Krugle)

1)	DataCorrectness.java
	import com. rational.test.ft.value.; import com. rational.test.ft.vp.; public class DataCorrectness extends DataCorrectnessHelper { public void testMain(Object[] args)

2)	DataPersistence.java
	import com. rational.test.ft.value.; import com. rational.test.ft.vp.; public class DataPersistence extends DataPersistenceHelper { public void testMain(Object[] args)

3)	DataLocation.java
	import com. rational.test.ft.value.; import com. rational.test.ft.vp.; public class DataLocation extends DataLocationHelper { public void testMain(Object[] args)

4)	UML2ObjCDataTypes.java
	* @author Simon Johnston */ public class UML2ObjC DataTypes { private final static Map datatypes = new HashMap(); static {

5)	TestConnectionToDB.java
	connection = new HSQLConnection(); results = connection.returnQuery("Select * from BOOK"); ResultSetMeta Data rsmd = results.getMeta Data(); int columns = rsmd.getColumnCount(); while (results.next()) {

6)	plugin.xml
	</UMLProfile> </extension> <!-- Add the library data type model --> <extension id="objcdatatypes"

7)	ObjCOperationsTemplate.java
	Parameter returnParameter = (Parameter) list.get(0); if (returnParameter.getType() != null) { if (UML2ObjC DataTypes.isUML DataType(returnParameter.getType().getQualifiedName())) { returnType = UML2ObjC DataTypes.translate DataTypeName(returnParameter.getType().getQualifiedName()); } else {

8)	ObjCAttributesTemplate.java
	String typeName = "id"; if (attribute.getType() != null && attribute.getType().getName() != null) { if (UML2ObjC DataTypes.isUML DataType(attribute.getType().getQualifiedName())) { typeName = UML2ObjC DataTypes.translate DataTypeName(attribute.getType().getQualifiedName()); } else {

9)	createCMERDADemo.chx
	--<scriptOptions statementTerminator="!" /> CREATE DATABASE JKENT USING CODESET UTF-8 TERRITORY US! CREATE TABLE HR.DEPARTMENT (DEPTNO CHARACTER (3) NOT NULL , DEPTNAME VARCHAR (29) NOT NULL , MGRNO CHARACTER (6), ADMRDEPT CHARACTER (3) NOT NULL , LOCATION CHARACTER (16))!

10)	overview_simple.html
	<p>This sample model provides a simple demonstration of a language feature for Objective-C. The model itself contains a set of views and diagrams that demonstrate the profile, data type library and how it has been constructed. This model can be used to execute the "UML to Objective-C" transformation to generate source code.</P> <H3>Annotations and notes</H3>

Introduction

When attempting to integrate data sources, you need to consider many activities. Rational Data Architect can help document decisions and automate parts of your tasks. In this article, you are introduced to a process you can use and modify for your specific data integration needs. The five steps to a successful design, covered in this article, are:

Back to top

Rational Data Architect product overview

Rational Data Architect is a data modeling and integration design tool designed to help data architects understand information assets, their relationships and dependencies, map assets to each other and create integration schemas. Architected for teams of any size, Rational Data Architect combines data modeling with mapping discovery and model and database analysis -- all in a single, integrated tool. In addition, Rational Data Architect supports enterprise standards enforcement. Rational Data Architect uses a heterogeneous approach that facilitates federation design and is an essential tool for information integration projects.

Rational Data Architect provides tools that can dramatically reduce design and development hours. This new software, built on the open source Eclipse platform, helps data architects model, discover, map, and analyze data across multiple information sources, automating information integration in complex environments.

Annotating existing infrastructure

The first step of the process helps users assess their current situation. Although this phase involves some automated steps, such as reverse engineering, most of this process is done manually because every annotation is basically a high-probability guess. It is essential to have the participation of the original designers of the data source, and users of the data source.

To annotate your existing infrastructure:

Connect to the existing data source.
To be able to access the data structure, you need to follow standard connectivity protocols. You need to know the type of the data source, the driver used to connect to it, and the login information (in most cases, login and password). Rational Data Architect uses standard JDBC connectivity to connect to the data source. All further communication with the data source is performed using native queries to the system tables of the data source.
Select the subset of available data structures from the data source.
Many data sources include data that is irrelevant for understanding stored information, such as counters, temporary helper tables used to sort data, and multilingual text for the user interface. It is much easier to eliminate such data structures at the beginning of the process.

Rational Data Architect allows filtering at any level of the data structure in the Database Explorer, shown in Figure 1. We will define a filter that will leave only relevant information.
Figure 1. Connecting to the data source
Create a model from the selected subset.
There are two main reasons to create a model from the data source:
- Most databases are not able to capture business relevant annotations and documentation at the level of detail needed for a successful integration process.
- Change management. Integrations have to be designed on a stable structure of data. If the of structure of the data source changes, you need to implement an update for the integration, resulting in a new version of the model.
A physical data model that can be created from the data source is basically an abstracted copy of the data structure from the data source. See Figure 2. Figure 2. Creating physical data model
Document data structures in the model.
While a model displays most of the level of detail of specification from the data source, this is not enough for our understanding of the data. For example, CLNR specified as CHAR(16) is not something that every developer would interpret in exactly the same way. In this activity, you add documentation to every element in the model, including every column, every table, every constraint, and every trigger. You should also specify business-relevant names, to allow faster readability of the model.

It's also strongly recommend that you create context-relevant diagrams. However, this does not mean you should create a huge diagram gathered from the walls of many meeting rooms. Instead, create small diagrams with approximately seven essential elements. (You can have less, but avoid more, if possible.)
Create or verify a glossary related to the model.
Used with activity 4, you can start creating a glossary that defines the meaning of names in the data source. Designers and developers have always sought to use names that make their jobs easier. Even when severe constraints on the length of names, naming standards were used for simplification reasons. Consistency depended on the discipline and life cycle of each data source.

You can refer to a glossary in Rational Data Architect, which includes a list of valid business names with possible abbreviations, shown in Figure 3. For example, the abbreviation CL could stand for client and the abbreviation NR could mean number. Some data sources could have even more extreme, non-intuitive abbreviations, such as J9 to mean client or O1 to indicate identifier. Rational Data Architect does not limit the number of glossaries that can be used at the same time, although I personally recommend that you use only one glossary per model. (This is, by the way, not a technical recommendation, but a user-experience based recommendation.)
Figure 3. Defining the glossary

These five activities to annotate your current situation may seem short, but most are very time intense and include a lot of manual work.

Back to top

Mapping data sources to each other

The integration process typically includes integrating from more than one data source and each data source needs to be annotated before you can proceed. After annotating the existing infrastructure, you understand each data source separately, but are still unclear about the overlapping and related information from all data sources.

Mapping existing data sources is optional, because it does not produce results that are required to further automate the process. However, it's highly recommend that you do the mapping, to increase your understanding of the completeness of data for integration, and to foresee possible collisions of data between different data sources.

To map data sources to each other:

Create a new mapping model between each pair of data source models.
A mapping is a dependency between two data structures that is not implemented in the realization of the data source. A mapping model is a summary of mappings between two independent data sources or data models. The number of mapping models rapidly increases with the number of data sources. You could have one mapping model for two sources, three mapping models for three sources, six mapping models for four sources -- all counting just one direction of models. If you are working with many data sources, you don't typically have to create all of the models. Instead, you can use some of them as references and create mapping models only to those models, as shown in Figure 4.
Figure 4. Map data source models
Discover (automatically or manually) mappings between the data source structures.
Remember the glossary created in the previous section? At this point, the glossary can help you automate an activity. Mapping discovery can use glossaries to create better suggestions for possible mappings. Each mapping expresses the rule of creation of target structure from the source structure. For example, suppose you have a mapping between driver's license as a target and birth certificates as a source. A mapping to the "name" on the drivers license would be a concatenation of the "first name," "middle name," and "last name" from the birth certificate. This is an example of a mapping that includes transformation. Models typically have hundreds of such elements. It is possible to define all of the mappings manually, but it would take weeks of work.

Rational Data Architect can help you identify the simplest of all mappings, which realistically represent the vast majority: the one-to-one mappings. Those are mappings from "family name" to "surname," for example. In the first version of Rational Data Architect, mapping discovery can use a combination of up to five discovery algorithms.

The simplest mapping compares the names of model elements, and optionally uses glossary models to increase the precision of results by expanding abbreviations into business names before comparison. More complex mapping discovery uses externally purchased thesauruses to find synonyms or even data samples from the data source to validate possible mappings. The discovery of mappings has to be done for each mapping model and should be accompanied by documentation of individual mappings for easier readability of the model.
Complete annotations of data source models.
You can gain additional understanding of data source models from mapping models. For example, you might discover that some data structure in the first data source is related to a data structure in another data source. It could also be an invalidation notice specifying that part of data should not be considered in the integration process because it is inaccurate. It is extremely valuable to complete the mapping between existing data sources, even if you do not intend to integrate information.

The results of the mappings should be explored from two perspectives:

Competing data from different models. Competing data could result in more complex integration specification that either prioritizes data from one data source from the other or includes the most recent data.
Exclusivity of data structures. These structures should be examined to determine whether it's necessary to include them in the federated model.

Both examinations result in business decisions and are dependent on your reasons for information integration.

Back to top

Creating a federated model

Gaining a good understanding of data sources is essential to validate whether you can complete the process of information integration. A main component of this process is specifying the target, or the schema, that will be visible after the integration. This step should unify the business demand that requires integration with the possibilities of your existing information.

Create a business (logical) model aimed at the solution.
A business model defines entities and relationships between entities, without consideration of the implementation platform. The model has to solve the business problem. If the business problem requires just a summary of all account standings, for example, then you don't need to include order details in the model.

Rational Data Architect implements this view as a logical data model, as shown in Figure 5.
Figure 5. Logical data model
A logical data model is not constrained regarding possible relationships between different entities. It can contain any kind of relationship, including subtyping and many-to-many relationships. During the design process of the logical model, the ongoing validation with business stewards, the owners of the business process, is extremely important. Only they can recognize if something is missing or if the model is not correct regarding relationships and rules.

To make the model even more understandable, you should create as many diagrams as required to express different business views. Documentation and annotation are the most important parts of models. Imagine how it would feel if someone gave you a model to read without a single line of additional documentation -- the model would lose some of it meaning and you could end up considering it nothing more than a nice drawing.
Turn the logical model into a physical implementation model.
The logical model expresses the business view of information. The next activity is to turn this model into a physical model that is constrained by the technology we'll use to realize it. This process is relatively straightforward for the first transformation and requires care during version upgrades of models.
Rational Data Architect allows you to transform a logical model to a physical model. During the transformation, Rational Data Architect automatically resolves all constraints of the target model, such as lack of many-to-many relationships or subtyping, and implements them correctly for the selected target. Rational Data Architect also lets you compare a logical model to the physical model, and update a physical model from this comparison, using the Compare & Synchronize function.

The resulting physical model is not the model that will actually be implemented as a schema in WebSphere Information Integrator; it is a prototype of the integration model, which will be created during the code generation and will replace tables with corresponding nicknames and views.

Back to top

Mapping federation data sources

The fourth major step in this information integration design is to create the mapping between original data sources represented by physical models and the target federation model, also represented by a physical model. This mapping has to be complete and executable to be able to generate code.

The activities in this mapping are very similar to Mapping data sources to each other, with only a few alterations.

Create a new mapping model between each data source model and the federated model.
This step results in exactly the same number of models as the number of data sources. The summary of all of those models will define how to create the complete federated schema from existing data sources. There will very likely be competing specifications for an element in different data models. We don't address them in this activity, but will eliminate them later on.
Discover (automatically or manually) mappings between the data source structures.
As in the previous case, you need to discover mappings between source and federated schemas, as shown in Figure 6. This activity is almost identical to the activity between different source schemas discussed earlier. You need to take care of more complicated cases that span more than one table structure on the source by using mapping groups. A mapping group is comparable to a result set that you get with one selection of data from the source to receive federated data (or one "select" statement).

You can use the alternative view of mapping groups in Rational Data Architect to evaluate and define joins of any complexity. If joins already exist in the source model, it will be automatically suggested in the mapping editor.
Figure 6. Mapping discovery
Complete transformations for mappings of data source models.
To use mappings to create federation code, you need to define executable transformations. Whenever there is a need for a change of format, content, or structure of data, you need to specify how this will be performed. This requires transformation code that is known to the server -- in this case, WebSphere Information Integrator.

Use the expression builder or enter the transformation directly in the expression property of a mapping in Rational Data Architect. Expression builder already offers a selection of WebSphere Information Integrator predefined functions that can be used.

Next you need to define all necessary transformations from the source to the federation schema. There is just one problem: there might be too many transformations. Because independent mapping editors were used, you don't have any control over the number of mappings that are defined for each element (column) on the target. This is something that you should resolve if you want to generate code.

Back to top

Generating federation code

The final step is the transformation from models back to executable code. You'll do this from the mapping model. But how can you make sure that you generate the right code?

To receive valid code for information integration from all data sources:

Combine all mapping models into one.
First, you need to get an overview of everything we defined as mapping from any of our data sources to the federated model. You can do this if you overlay all of the source models on one side, and leave the single federated model as the target on the other side. This step results in a very busy model with a lot of mappings, which should not be a big concern, because you'll eliminate many of them in the next step.

Rational Data Architect lets you combine two mapping models into one in several ways. The one we'll use combines two models with identical targets. We will repeat this until all of the models are joined into one.

Another possible way you could combine two mapping models is when the target of one is identical to the source of the other model.
Eliminate competing mappings.
This activity is essential if you want to receive a single executable model. The result needs to be a single executable mapping for each of the target elements (columns). Combining all mapping models created many elements that are targets for more than one mapping. We will look at such elements and select one single mapping. All other mappings need to be removed.

Alternatively, you could also delete a mapping group if you decide that the whole mapping group (the join) should not be used.

You also need to delete all mapping groups that are empty. You can easily do this by selecting the mapping group details view in the resulting mapping model.
Generate target schema from mapping model.
From the model, you can generate the DDL, though we have to be careful. Remember that every physical model knows about the target capabilities. You need to select a model generated from WebSphere Information Integrator to receive federation code with nicknames and generated views.

While in the code generation wizard from the mapping model, Rational Data Architect allows for changes to the names for any generated element, as shown in Figure 7. The result of the code generation is a schema with all elements in the target integration model, as well as a script for code generation.
Figure 7. Generate integrated schema
Execute schema DDL with WebSphere Information Integrator.
It's rewarding to see the generated script and to know it's available for changes. I recommend generating code from the model itself because you can compare it with the target and generate code selectively.

When generating code, you'll use a connection to WebSphere Information Integrator -- the same as used to reverse engineer initial models.

And now you've finished the design process. At this point, it's time to think about test and deployment.

Back to top

Summary

This article described a five-step process for federation design that will produce a federated schema. You also end up with a set of intermediate models that are completely reusable, and will shorten the process next time. This process also helps increase your understanding of the overall information infrastructure.

Rational Data Architect was created to help you with your information integration. I invite you to explore more about it using the download in Resources.

Resources

Learn

Learn more about Rational Data Architect.
Get Rational Data Architect product support.
WebSphere Information Integrator: Read an overview and learn more about the different editions of this product.

Get products and technologies

Download a free trial version of Rational Data Architect V6.1.
Build your next development project with IBM trial software, available for download directly from developerWorks.

Discuss