Normalization

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

Articles

Recherche:

Recherche

Normalization

Hilmar Zonneveld, August 1, 2002

Overview The purpose of this article is to give an overview of normalization. Basically, normalization refers to having an efficient table structure. I will not discuss the famous "first to fifth normal forms" - if you want that information, enough texts exist about it in other places (search sit...

Overview

The purpose of this article is to give an overview of normalization. Basically, normalization refers to having an efficient table structure. I will not discuss the famous "first to fifth normal forms" - if you want that information, enough texts exist about it in other places (search sites like Google, or PcWebopedia, for terms like "normalization", or "third normal form"). Instead, I will outline a few simple rules to help you plan your table structures efficiently.

The information in this article is relevant to FoxPro, Visual FoxPro, or any other database management system. For instance, the same considerations apply when you decide to use Visual FoxPro as a front-end for a client/server system, with Oracle, SQL Server, etc.

Introduction

The basic idea of normalization is to distribute data among tables efficiently. An efficient table structure helps you save space, but there are also other, usually more important, criteria. For instance, in a sales system, if you save all the data related to the client in the main invoice table, you have to repeat this information for each invoice, and this wastes space. However, there are more serious problems, too. For instance, if some information, say, the client address, changes, you have to make the change for each invoice, which can be both complicated to program, and slow. Or, you do the update for one invoice, and forget to update another one - now, which is the valid data? ("update anomaly"). Moreover, if you delete the last invoice for a client, you lose the information about the client ("delete anomaly"). Clearly, it is convenient to have a separate table for clients.

In general, it is convenient to have a separate table for each separate "entity" you can identify - in a sales system, this might be clients, products and invoices. But this general definition does not suffice to solve all your design problems - for instance, in practice, it results convenient to have the invoice header and the invoice details (products sold) in separate tables.

Typical problems

Let's start by showing some typical problems that can occur when information is not stored efficiently. A university decides to record student grades. It decides that no student will register in more than 6 subjects in a semester, and therefore creates a "grades" table, where the following information is stored: student code, semester, subject, grade. The last two fields are repeated 5 more times.

Now, just try to imagine the nightmare of obtaining a list of all students registered in a certain subject. Also, the limit of 6 subjects per student is arbitrary - the rules can easily change next semester, or exceptions can be granted, to allow for a 7th. or 8th. subject for a student. Finally, reserving fields for a large amount of subjects wastes disk space for the cases of students who register in only a few subjects.

Let's hope that the programmers are clever enough not to start with any serious programming before they come up with a more flexible data structure.

In this case, the solution is to put every subject for every student, with its corresponding grade, in a separate record. Now, commands like LIST FOR SUBJECT = varSubject are possible (or BROWSE, or SCAN, with a similar condition) are possible. Clearly, such simple constructs are impossible with the previous structure.

Also, now an arbitrary number of grades can be stored for each student, and no space gets wasted for the non-used subjects.

Primary and Foreign Keys

Usually, each table needs a primary key (a field, or fields, that uniquely identify each record). This key is used in other tables as a reference. For instance, the client table might have a field that has a unique value for each client. This value is used in an invoice, to reference the corresponding client.

This primary key might be a client code assigned by the company, or, for instance, a sequential number which the end-user never sees. (See my article on primary keys, in the Universal Thread Magazine, for more details about primary and foreign keys.)

Rules based on relations

The simple rules I suggest you consider are based on the kind of relation between to pieces of data: whether the relation is one-to-one, one-to-many, or many-to-many.

One-to-one: A one-to-one relation (a piece of data is used, at most, once) is an indication that both pieces of data should simply be different fields in the same table. Please note that this includes the case that you need to store some simple data (like a date, age, name, etc.), which doesn't have to be unique, but for which you don't need to store additional information.

One-to-many: A one-to-many relation is an indication that the information has to go into two separate tables. The many-side is a separate table, that has a pointer (a foreign key) - usually a single field - that points to the one-side.

Many-to-many: A many-to-many relation requires a third table, to resolve into two one-to-many relations.

Note that for a database to be considered "normalized", information that can repeat (like the different subjects and grades for each student, mentioned above) is always stored in separate records, not in separate fields. This may require additional tables, which can be determined with the above rules.

Examples

Let's now see a few examples, from a sales system:

Each client has one e-mail address, one phone number, etc.: store the e-mail address and phone in the client table.
Depending on the needs of your system, you may need the capability to save several e-mail addresses or phone numbers for some clients. In this case, create a separate table for addresses. The fields might include the foreign key of the client, a code to indicate the type of address (whether it is a telephone number, a fax number, an e-mail, house address, etc.), and a field for the actual address information.
Each invoice has only one date: the date can be stored in the invoice table. The date, here, is a simple piece of data. If we don't need to store additional information about the date, we don't need a separate table for dates, even if the date can appear in more than one invoice. (That is, the one-to-many relation, one date for several invoices, can be considered as if it were a one-to-one relation).
Each product has a product code. Each product has a product description. So, both the product code and the product description go into the product table.
Each invoice has only once client, but there can be many invoices for one specific client. Since this is a one-to-many relation (and we need additional information for each client), clients and invoices must needs be stored in separate tables.
Each invoice can have several products, and each product can appear in several invoices. To resolve this (many-to-many) relation, we need a third table. The first two tables, of course, are invoices and products. The third table can be considered a table of invoice details.
Each product has a current price. The price goes directly into the product table, as a field. But:
When you sell a product, you later need to know at what price it was actually sold - the current price is not always enough information. So: every invoice detail has a price. Keep a copy of the price (the price at which the product was actually sold) in the invoice detail table.

Self-relations

When a table is related to itself, the number of tables described in the rules above decreases by one. Examples:

"Each employee has a boss". Let's assume that you only need to save one immediate superior for each employee. Since several employees can share the same boss, this is a one-to-many relation. But the boss is also an employee, therefore, we can create the relation from employee to employee, and the two tables become one. You only need one additional field for each employee: the employee-id for the corresponding boss. (Special note for referential integrity: in this kind of relation, you will always have at least one record where the "foreign key" is empty, for instance, one employee is the supreme boss, and doesn't, himself, have another boss. The corresponding RI-rule must therefore be set to "ignore" or to "allow blanks".)

"Each article contains materials". For a production control system, it might be convenient to have information about intermediate articles, that are made of other materials, but that, in turn, appear as materials in other articles, later in the manufacturing process. This can go through several levels. To simplify relations, it is convenient to store everything, from raw materials to the finished product, passing through several levels of intermediate products, in a single product table. Let's call this table "articles". Now, the relation becomes: "ArticleA contains ArticleB" (ArticleA can be a finished article or an intermediate article; ArticleB can be raw material or an intermediate article). Since this is a many-to-many relation (an article contains several materials, each material is contained in many articles), we need an additional table, which we might call "ArticleComponents". This table needs two foreign keys for articles (ArticleA contains ArticleB). Note that, because of the self-relation, the "third table" mentioned previously here becomes a second table.

Case study: registration system

To see how this works in practice, let's do a more detailed analysis of the student registration and grades system, for a university, mentioned previously.

At the beginning of each semester, a student registers for several subjects. At the end of the semester, the student receives a grade for the subject.

In the above paragraph, we can quickly identify several "entities", for each of which we need to save additional information, and for each of which we therefore need a separate table. While we think about what additional information we need to store, the need for some additional tables soon becomes clear.

For the student, we need a student table. Fields will include the student code, name, address, telephone, degree program.
For each subject, we need - in a subject table - the subject code, the description, and the number of credits.
To describe each teacher, we need a teacher table. The information might include a teacher code, names, and what kind of degree the teacher has (doctor, master, etc.).
We want to record what teacher teaches each subject. Each subject can be taught by different teachers, and each teacher can teach different subjects. This is a many-to-many relation. Also, what subjects are actually taught by what teacher will vary from semester to semester. We can therefore have a "groups" table, with different groups for each subject. The fields might include the semester, foreign keys for subject and teacher, and the main classroom.
Since we might need additional information to describe each semester, we need data in a semester table: a semester code, perhaps a more detailed description, and the date when the semester started.
Perhaps we want to store information about prerequisites. Each subject can be prerequisite for several subject, and each subject, in turn, can have several other subjects as prerequisites. Clearly, this is a self-relation, of the many-to-many kind. Thus, we require a second table, which we might call "prerequisites". Mainly, this table will contain two foreign keys to the subjects table. Since prerequisites may vary from one career to another, we will also need a foreign key to the career table.
Oh, right, we need a career table, too. Fields might include, at the very least, a code and a description of the career.
As mentioned previously, each student can register for several subjects. Since several students can also take the same subject, this is a many-to-many relation. We therefore need a third table, which we might call "grades" or "registration", since it will store both kinds of information. The fields we need include foreign keys to subject, student, and semester, and the final grade.

In actual practice additional information is required (for instance: payments; course requirements by career, to compare whether a student has fulfilled the requirements for the degree; and other information), I will leave it at these brief examples.

Denormalization

When following the rules outlined above, your tables will usually follow most formal definitions of "normalization". However, I suggest to use the rules above as general guidelines. Full normalization is not always the most efficient way to do things. For instance, you might consider the creation of a separate address table, as outlined above, as too much trouble, and prefer to leave two fields for emails (email1, email2). While this is not completely normalized, you may consider that you don't need a third email anyway, that you don't need to access all emails systematically (for automatic processing, you will only use the default email, email1, anyway), and that maintaining an additional table requires additional overhead.

Summary

Whether you use a more formal definition, which you can find in other texts, or the rules outlined above, proper normalization of your database structures can often help you save space, avoid certain anomalies, and add flexibility to your database system.

Hilmar Zonneveld, Independent Consultant
Hilmar Zonneveld works in programming since 1986, using dBASE, FoxPro and Visual FoxPro. He is available as an independent consultant. He currently works as a programmer at Bata Shoe Organization; also as an instructor at Cisco Networking Academy. You can contact him through the Universal Thread, or, via e-mail, at hilmarz@yahoo.com. Personal Web page (mainly in Spanish): www.geocities.com/hilmarz.

More articles from this author

A Basic Audit-Trail

Hilmar Zonneveld, May 1, 2003

An audit-trail is a record of who did what changes, and when. In Visual FoxPro, this can easily be accomplished through triggers. I hinted at the possibility of doing an audit-trail, in my article on triggers - now, as a reaction to questions in the Universal Thread, I want to present a sample...

Breaking Inheritance

Hilmar Zonneveld, December 6, 2001

(The latest update contains minor edits only.) Five easy and fun ways to get yourself into trouble with inheritance. A frequent source of problems in OOP is called "breaking inheritance". This document briefly describes what inheritance is, how it applies to properties and methods, and how it ...

Buffering

Hilmar Zonneveld, July 1, 2002

Introduction Buffering is a feature in Visual FoxPro that allows us to give the user "undo" and "save" capabilities. In the old FoxPro 2.x days, programmers either didn't provide this capability, or edited memory variables, and copied information between these variables and the table fiel...

CHM help file shows error messages instead of help pages

Hilmar Zonneveld, October 6, 2005

Due to a recent Windows security fix, users can no longer access a CHM file on a server. The table of contents appears, but the individual pages are replaced by error messages. Access to CHM files in specific folders can be explicitly allowed through special registry settings.

Control Arrays

Hilmar Zonneveld, July 20, 2001

(The last update contains minor edits only.) The idea is to have several controls on a form controlled with an array. Thus, you can quickly go through all the controls on the form, managing the array. The sample code included will help you get started quickly. You can easily adapt it to manage...

Excel Automation

Hilmar Zonneveld, September 1, 2002

With Automation, you can control all aspects of Excel, Word, or other programs that provide this feature, from Visual FoxPro. In this article, I will concentrate on Excel. Its purpose is to provide a starting point, especially for people new to automation. Introduction With automation, you bas...

Hierarchical To-Do List

Hilmar Zonneveld, March 1, 2003

Introduction One common task in programming is to keep track of what problems are pending. For this purpose, I use a "hierarchical to-do list": a list of items, each of which can have sub-items. All you need is Microsoft Word. Alternatives are available as freeware or shareware, but in t...

Inheritance tutorial

Hilmar Zonneveld, October 7, 2005

This is a step-by-step tutorial to show inheritance, specifically in Visual FoxPro forms, as a guidance for people who are not familiar with inheritance in general, or who don’t know how to implement it in Visual FoxPro. The basic idea of inheritance is that all your forms, or several of your for...

Least common multiple and greatest common factor

Hilmar Zonneveld, May 30, 2004

The code shows how to quickly obtain the greatest common factor, and the least common multiple. Both functions are used when manipulating fractions, among others. Several methods are possible; the method usually taught in school involves prime numbers, but this code will execute much faster (and it ...

Multi-level cost calculation

Hilmar Zonneveld, November 1, 2006

A standard requirement in a production system, or in systems for cost calculation, is to add up all the raw materials for a number of finished articles, to get the total cost, or simply to purchase the materials. In this article, Hilmar outlines how to do this with multiple levels of intermediate ar...

Open any document

Hilmar Zonneveld, November 8, 2001

The following function will open any document, with its default association (the same application that will be called when you double-click on the file, in Windows Explorer). Use it to open a text-file, a Word or Excel document, an image, etc., with an external application.

Primary keys

Hilmar Zonneveld, May 1, 2002

Introduction This document explains the meaning of primary key, foreign key and candidate index in Visual FoxPro. A discussion of natural and surrogate keys (keys visible, or not visible, to the end-user) is included, including the advantages of each approach, as well as different methods for o...

Recursion

Hilmar Zonneveld, January 1, 2003

Continuing my series of introductory articles, this article presents an introduction of a simple yet powerful programming concept: recursion. Introduction "To understand recursion, you must first understand recursion." "To make yogurt, you need milk and yogurt." If you are not accustomed...

Reducing your code

Hilmar Zonneveld, December 1, 2002

Introduction This article presents an introduction to coding shortcuts in Visual FoxPro - when to use them, and when not to. Notes on coding in general This article is about coding shortcuts; however, I should first emphasize that making the code as small as possible is usually not the number...

Rushmore Optimization - not always optimal

Hilmar Zonneveld, July 20, 2001

Rushmore Optimization can help make queries much faster. However, "Full Rushmore Optimization" is not always a desirable goal. "Partial Optimization" is sometimes much faster. It is often believed that to speed things up, you need to have as many indices as possible. This article explains that so...

Show seconds in a readable format

Hilmar Zonneveld, June 7, 2002

If you need to check elapsed time with seconds() or a datetime value, this function allows you to display the elapsed time in a human-readable format, that is, hours:minutes:seconds, instead of the total number of seconds. Just pass a number of seconds as a parameter.

SQL Commands

Hilmar Zonneveld, April 1, 2002

SQL is a standard language used to manipulate databases. Several of the SQL commands are integrated into the Visual FoxPro language. Select This is a very flexible command, used to select data from a table, or from several tables. This command has options to get totals from several record...

Text Manipulation

Hilmar Zonneveld, August 1, 2003

In this article, I will show several ways to manipulate text-files. Knowledge of these methods is often important to import and export specific formats. Some of the techniques can also be used to work with files of any content; however, this article will concentrate on text-files. When ...

Triggers and referencial integrity

Hilmar Zonneveld, June 1, 2002

The purpose of this article is to show how to use some aspects provided by the Visual FoxPro database engine, to control our data. Indices Perhaps most readers already know indices; anyway, I find it convenient to include a brief summary of the topic, since this is a requisite to understan...

Using the HTML Help Workshop

Hilmar Zonneveld, November 1, 2002

A help file can be used either for interactive help, or as an online manual. In this article, I will give an overview over creating help files in the new help format (CHM), for Visual FoxPro 6 and later. This article is introductory and assumes no prior knowledge of the Help Compiler, or of HTML cod...

Using the Visual FoxPro Debugger

Hilmar Zonneveld, February 1, 2003

Introduction Any real-world application will sooner or later misbehave. It is important to be able to find those problems. Visual FoxPro's built-in debugger can help a lot to find out why your program doesn't work as you thought it would. Most of the material in this article applies to Visual...

VisioModeler - a free CASE tool

Hilmar Zonneveld, May 1, 2006

This article is an introduction to VisioModeler. This is a free CASE tool, that can help you design your database, in the process sharing the information with the client in a visual, easy-to-understand, format.