Data Modeling Concepts

From ER/Studio Data Architect
Jump to: navigation, search

Go Up to Overview

A data model represents the things of significance to your enterprise and the relationships between them. At its core, a data model depicts the underlying structure of an enterprise's data and the business rules governing it. A data model is comprised of two parts, a logical design and a physical design.

The following describes the main components of an ER/Studio diagram:

The Logical Model

A logical model is developed before the physical model. It addresses the business and functional requirements of systems development. The logical design allows you to determine the organization of the data that you need to store in a database before you create the database; it serves as a blueprint.

The Physical Design

The physical design addresses the technical implementation of a data model, and shows how the data is stored in the database. For example, you can specify the datatype to be used by each column in the table, and determine how tables will be stored in the database.

Data Model Design Principals

To design the most effective data model possible, you should focus on the logical design before developing the physical design. Both the logical and physical design processes are complex, so it is best to separate rather than to mix the two. A sound logical design should streamline the physical design process by clearly defining data structures and the relationships between them.

The Purpose of a Data Model

A data model can be useful for other things in addition to creating databases, although creating a database is generally its primary purpose. In systems development, the goal is to create an effective database application that can support some or all of your enterprise. However, a data model of your business can help you define operational aspects of your business that you might otherwise overlook. Also, a well-defined data model that accurately represents your business, can be helpful in orienting employees to goals and operations. The data model can also serve as an invaluable communications tool for both internal and external constituents.

The Relational Model

Most early data models were developed to help systems analysts make business data conform to a physical database or machine architecture. Hierarchical and network models often ran most efficiently on particular systems. In the early 1970s E. F. Codd developed the relational data model, based on relational algebra. The relational model stressed data independence, where data independence is defined as independence of data from the underlying physical structure in which it is stored. Thus, systems that supported relational data models let users easily migrate data to larger or newer systems with little regard to the physical differences between storage devices.

The power of the relational model lies in its simplicity. In the relational model, data is organized in tables of rows and columns. Each table represents one type of data. Each row, or tuple, represents one item of data of that type. Each column, or domain, represents one type of information about the type of data stored in the table.

CHART Sample RTable.gif

The Entity-Relationship Model

Peter Chen introduced entity-relationship modeling during the late 1970s. Along with a number of other theorists, such as Hammer and McLeod with their Semantic Data Model, Chen introduced a new way of thinking about data. Chen's ideas stressed that a data model should represent the reality of a business, without regard for how that model might be implemented in a manual or automated system. Though it can seem ridiculous today, these ideas were revolutionary at the time, and were instrumental in freeing individuals from the constraints of a hierarchical business model. The ability to model a business 'on paper' let business planners test out and debug ideas before implementing them, thus saving money, other resources, and aggravation.

The basic idea behind entity-relationship modeling is this: everything in a business can be generalized into an abstract or archetypal ideal, which we call an entity. These entities have certain characteristics or attributes. These entities also are related to one another through actions that each entity performs on one or more of the other entities. We call these actions, relationships.

CHART Chen.gif

In determining the relationship between each of the entities in a data model, you will define a number of business rules. Business rules define your business and how you run it. For instance, you know that to be useful, each employee ID, project ID, and department ID must be unique. While you could use the social security number as the employee ID, you might want to keep the ID short and more easily sorted. More complex issues to consider are questions such as, "How are we to connect employees, projects, and departments so that we minimize data redundancy and maximize our reporting capabilities?" The answers to this and other questions form the rules by which you run your business.

Attribute definitions can be used along with relationships to determine or enforce business rules. You can define a particular set of valid values for any attribute of any entity. For example, you can define a valid range of salaries for a subset of employees for an attribute called Salary. This set of values is known as a domain.

The Dimensional Model

Dimensional modeling is generally agreed to be the most useful for representing end-user access to data. Where a single entity-relationship diagram for an enterprise represents every possible business process, a dimensional diagram represents a single business process in a fact table. ER/Studio Data Architect can identify individual business process components and create the DM diagram for you.

You can create a dimensional diagram by starting with a logical model, by reverse-engineering, by importing an SQL or ERX file, or by adding a new physical model.

See Also