Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Articles
Recherche: 

Primary keys
Hilmar Zonneveld, May 1, 2002
Introduction This document explains the meaning of primary key, foreign key and candidate index in Visual FoxPro. A discussion of natural and surrogate keys (keys visible, or not visible, to the end-user) is included, including the advantages of each approach, as well as different methods for o...

Introduction

This document explains the meaning of primary key, foreign key and candidate index in Visual FoxPro. A discussion of natural and surrogate keys (keys visible, or not visible, to the end-user) is included, including the advantages of each approach, as well as different methods for obtaining unique values.

This information was written in reaction to frequent discussions, on these topics, in the Universal Thread (www.universalthread.com), and althoughit is oriented towards Visual FoxPro, most of the concepts can be applied to other programming languages or database servers as well.

Primary and Foreign Keys

In database theory, a primary key is the main way to identify a record. It is a field, or a combination of fields (actually, in Visual FoxPro, any expression), that must be unique for each record in the table. While Visual FoxPro doesn't actually require it, it is usually convenient for the developer to define a primary key for most tables.

As an example, each client can be assigned a number (for instance, in the United States, the Social Security Number (SSN) can be used).

The same information placed in other tables, as a link to the main table, is called a foreign key. For instance, in an invoice table, the same SSN can be used as a link to the client table, that is, to indicate for which client the invoice was extended.

Natural and Surrogate Keys

The primary key can be a field, or a combination of fields, with information in the table that is also used for purposes other than the primary key. This is known as a natural key, a key with business value, or an intelligent key.

Another alternative is to use, as a primary key, a number (or other value) generated automatically. The end-user doesn't get to see this key. This kind of key is known as a surrogate key, or a key without business value. Since many programmers use this kind of key, some database systems, like Access or SQL Server, have a built-in mechanism for creating "auto-incremented" values in a field (however, their use is optional - the developer has to decide whether he wants natural, or surrogate keys).

Visual FoxPro has no such built-in mechanism for "auto-incremented" fields, but it is not too difficult to achieve this. A sample function is provided in the appendix.

There are developers that favor the natural keys approach, and developers that favor the surrogate keys approach. It is also possible to use a mixed approach (natural keys for some tables, and surrogate keys for others).

Arguments in favor of Surrogate Keys

  • With natural keys, it is sometimes difficult to find a field, or a combination of fields, that is unique. For instance, using the SSN in the example above gives rise to some questions: what about foreign citizens that have no SSN? What about duplicate SSNs? With surrogate keys, it is easy to assign an "autoincremented" field to each table. This one, automatically generated and never seen by the end-user, will naturally be unique. With natural keys, sometimes more than one field has to be used for the primary key. These fields have to be added as foreign keys in other tables; with surrogate keys, a single field is used in the other tables. (However, depending on business logic, the developer may still want to ensure uniqueness in the key that the user sees.)

  • When using surrogate keys, the end-user can change the key field he sees - like the SSN, or a product code - at any moment; there is no need to propagate the change through other tables, since the primary key doesn't change. There is usually no need to actually change the primary key, since the end-user never gets to see it. In Visual FoxPro (and in most database systems), natural keys can be propagated with Cascading Update Triggers; however, this can be slow if a value appears in many records.
  • The fact that the value of the primary key will usually not change has some advantages when comparing old data (for instance, from a backup) to recent data. The information can be compared on a record-by-record basis, based on the primary key value. With natural keys, for instance changes in product codes willbe propagated to every table that uses the product. This makes it difficult to compare, for instance, product # "ABC-123" (old code) with product # "00123-ABC" (new code). (Workarounds are possible, of course. For instance, you can assign a unique number for each product, in an additional field (using function SerialNumber(), in the appendix), but not use it as a primary key. Or you may maintain a separate table of product code equivalents.)

  • Size: At first sight, it might seem that you need an additional field for the surrogate key. This is true, of course, but this is often more than offset by savings in the foreign keys in the other tables. Since the primary key uses only 4 bytes (for integer fields, in Visual FoxPro), the foreign key in other tables requires only 4 bytes, too. For instance, using a 4-byte integer instead of a 10-byte character code, for the primary key in the product table, requires 4 additional bytes for each product in the products table. However, in a table of product reception details, 6 bytes are saved in each record - and this table will probably have much more records than the products table, since each product may appear severl times in this second table. (The size argument doesn't apply in all cases: (a) the developer may decide to use larger keys, for example, GUIDs - see below; (b) the primary key of some tables may not actually be used in other tables as a foreign key; (c) some natural primary keys may actually use 4 bytes or less, for instance, a 1 or 2-byte character field for a lookup table.)

    Arguments in favor of Natural Keys

    • The main difficulty I see with surrogate keys is in the complex programming, that requires lookups both ways, especially for data forms. Let's take the case of the client as an example. In a table that shows invoices, the user types the SSN, the form has to do a lookup in the client table to return the internal client ID, and store it in the invoices table. On the other hand, for looking at an existing invoice, the form has to take the internal client ID and do a lookup, in the client table, for the SSN. Creating form objects to edit data like this can be fairly complex (fortunately, through inheritance, if you can solve the problem once, you can easily expand it to have a generic solution).

    • When joining data for reports, sometimes additional tables have to be joined when using surrogate keys. Take the SSN as an example. If this is saved as foreign key in the invoice, it is immediately available. When using a surrogate key, this field has to be fetched from the client table.

    • The information in the table is more readable with natural keys (with surrogate keys, a table that has a foreign key has to be joined to the main table, in order to get readable information).

    • For the same reason, creating a quick report on a single table is often easier with natural keys.

    • Debugging is simplified, since the developer sees "natural" information in the tables.

    • With surrogate keys, at least when using the "auto-incremented integer" approach (see below), it sometimes happen that the table that stores the sequences gets out of synch, causing "trigger failed" errors due to duplicate values in the primary key.

    The speed argument

    At first sight, it would seem that Visual FoxPro would join tables (in a SELECT - SQL statement) faster with integer keys, than with, say, a 10-byte character field (typical for a natural keys, for instance, a product code). This is what I had been told, and I actually believed this for many years. Actual testing I have done recently, however, revealed that two-table joins with 10-byte character fields are actually slightly faster than the equivalent join with integer keys! I assume that the reason is related with built-in data compression in character-type indices. (For composite keys, the developer should use composite indices and composite join expressions, for fastest data access.) I leave it up to the reader to do additional testing for additional speed tests on longer character fields, on composite keys, on other field types (e.g., date, or date-time). Also, the situation can be different in the case of a Database Server, like SQL Server or Oracle (I assume that a smaller key will usually be faster, but this assumption would better be tested - see above!). Since the situation is not so clear-cut here (in favor of one approach, or the other), I decided not to put the speed argument in favor of either approach. Options for generating Unique Numbers

    This section applies mainly to those developers who are considering using surrogate keys for some or all of their tables. However, the function SerialNumber() (below) can also be used to generate sequential numbers, visible to the end-user, for document numbers (for instance, invoice numbers).

    • The first option that comes to mind, to many people, is to simply use the record number (recno()) as primary key. After all, we can save a field this way! It turns out that this solution is completely unacceptable. The main problem is that, when we PACK a table, the position of each record changes. Then, foreign keys in other tables would become invalid.

    • We can use an integer field, automatically incremented for each record. This approach is fairly straightforward. Create a table with fields (sequence C(20), nextnum I). Create an index on sequence. Copy the function SerialNumber() (in the appendix) into your database stored procedures. Call the function from the default value of the primary key field, for instance, default value of clientid (in client table) = SerialNumber("client"). This allows for up to 2e9 different values. Some numbers may be skipped (if the user creates a record, the number is automatically assigned; if the user then clicks on "Undo", the changes are reverted, but field NextNum, in table SerialNumber, is already incremented)). This is not a problem with numbers which the user never sees - the only requirement being that the primary key be unique. This function can also be used to generate sequential numbers which the user does see, like invoice numbers. In this case, to avoid skipping numbers, the function should be called only when saving the record, and the record should be saved within a transaction. Thus, if the save fails, the changes can be undone. In you use the function this way, table SerialNumber must needs be part of the database (that is, not a free table), otherwise it can't participate in transactions.

    • Some developers prefer to use GUIDs for primary keys. As far as I could gather, GUID stands for "globally unique identifier". It is a unique value generated by the Windows API. It is extremely unlikely for the same value to be generated again - either on the same computer, or on another computer, anywhere on the world. Because of their larger size (128 bits, or 16 bytes), the size advantage of using surrogate keys doesn't apply when using GUIDs. The reason to use GUIDs is because when several users have to input data, and can't connect to the central network all the time (because of geographical distance), a way has to be found to ensure that primary keys are unique. To assign the value automatically to a field, the function GetGUID(), in the appendix, can be called from the default value of a field.

    • The same objective, of getting unique keys for geographically remote computers, can also be obtained assigning a unique number to each user. Let's say this unique user ID is stored in variable gnUserId. Then, either the primary key can be made up of two fields (in the client example: field client, and field UserId), or the information can be combined in a single field. This latter choice makes it easier to join tables, for referential integrity, or for a query or view. The following expression might be placed in the default value of field ClientId: bintoc(gnUserId, 2) + bintoc(SerialNumber("Client")). Note that for this example, the primary key - and the corresponding foreign keys - should be defined as C(6). The way I wrote the expression above allows for up to 32,000 different users (or offices). Function bintoc() requires VFP 5 or later.

    Candidate indices and unique values

    Theory: a primary key is the main form of identifying a record. Naturally, the field or fields that make up a primary key must be unique. However, we may also require other fields, or combinations of fields, to be unique. It is for this purpose that Visual FoxPro lets us create candidate indices. Uniqueness (in primary indices and candidate values) is enforced through the index. In other database systems, candidate indices would be called "unique indices". In the case of Visual FoxPro, the keyword "unique" was used previously (in FoxPro) for another purpose. Unique indices are kept for backwards compatibility, but apart from that, they no longer have any practical application in Visual FoxPro (to obtain all the different values for a field, use a SELECT - SQL statement instead, either with the DISTINCT keyword, or with GROUP BY).

    In each table, there can be only one primary index, but there can be several candidate indices. Other than what I said already, Visual FoxPro doesn't make much difference between a primary and a candidate index.

    A technical problem arises, for instance, when using candidate or primary indices for a SSN. After deleting a record, it is still in the table and participates in uniqueness validation. In other words, when erasing a record and then re-using the key value, Visual FoxPro will protest about the duplicate value (as far as I know, this problem applies only to Visual FoxPro - not to other database systems). One solution for this is to use a filter on an index, with the expression "NOT DELETED()". Remember that a filtered index can't participate in Rushmore Optimization. This may be good, or bad, depending on the circumstances: see my FAQ on Rushmore Optimization (in the Universal Thread) for details. If Rushmore Optimization is desired for a specific field, sometimes it may be necessary to define two indices: a filtered one, for the primary or candidate index, to ensure uniqueness; and a non-filtered one, of "regular" type, for Rushmore Optimization.

    Appendix: Functions

    FunctionCode
    SerialNumber()

    Fetches the next number in a sequence (you may want to check for other, similar, functions, in the Universal Thread download section).

    * Get serial number. Used mainly to generate primary keys.
    * The easiest way to achieve this is to call this function from a fields
    * default value.
    FUNCTION SERIALNUMBER(tnSequence)
    tnSequence = lower(tnSequence)
    local lnSelect
    lnSelect = select()
    if used("serialnumber")
       select serialnumber
       else
       select 0
       use serialnumber
    endif
    set order to "sequence"
    seek padr(tnSequence, len(sequence))
    if not found()
       append blank
       replace sequence with tnSequence, nextnum with 1
    endif
    local lnReturnValue
    if lock()
       lnReturnValue = nextnum
       replace nextnum with nextnum + 1
       else
       lnReturnValue = -1
    endif
    unlock
    flush
    select (lnSelect)
    return lnReturnValue
    ENDFUNC
    
    GetGUID()

    Returns a 16-byte (128-bit) GUID, through the Windows API. For efficiency, the DECLARE statement should be moved to the main program.

    PROCEDURE GetGUID
    DECLARE INTEGER CoCreateGuid ;
      IN Ole32.dll ;
      STRING @lcGUIDStruc
    
    cStrucGUID=SPACE(16)
    cGUID=SPACE(80)
    nSize=40
    IF CoCreateGuid(@cStrucGUID) # 0
      RETURN ""
    ENDIF
    return cStrucGUID
    ENDPROC
    
    Hilmar Zonneveld, Independent Consultant
    Hilmar Zonneveld works in programming since 1986, using dBASE, FoxPro and Visual FoxPro. He is available as an independent consultant. He currently works as a programmer at Bata Shoe Organization; also as an instructor at Cisco Networking Academy. You can contact him through the Universal Thread, or, via e-mail, at hilmarz@yahoo.com. Personal Web page (mainly in Spanish): www.geocities.com/hilmarz.
    More articles from this author
    Hilmar Zonneveld, May 1, 2003
    An audit-trail is a record of who did what changes, and when. In Visual FoxPro, this can easily be accomplished through triggers. I hinted at the possibility of doing an audit-trail, in my article on triggers - now, as a reaction to questions in the Universal Thread, I want to present a sample...
    Hilmar Zonneveld, December 6, 2001
    (The latest update contains minor edits only.) Five easy and fun ways to get yourself into trouble with inheritance. A frequent source of problems in OOP is called "breaking inheritance". This document briefly describes what inheritance is, how it applies to properties and methods, and how it ...
    Hilmar Zonneveld, July 1, 2002
    Introduction Buffering is a feature in Visual FoxPro that allows us to give the user "undo" and "save" capabilities. In the old FoxPro 2.x days, programmers either didn't provide this capability, or edited memory variables, and copied information between these variables and the table fiel...
    Hilmar Zonneveld, October 6, 2005
    Due to a recent Windows security fix, users can no longer access a CHM file on a server. The table of contents appears, but the individual pages are replaced by error messages. Access to CHM files in specific folders can be explicitly allowed through special registry settings.
    Hilmar Zonneveld, July 20, 2001
    (The last update contains minor edits only.) The idea is to have several controls on a form controlled with an array. Thus, you can quickly go through all the controls on the form, managing the array. The sample code included will help you get started quickly. You can easily adapt it to manage...
    Hilmar Zonneveld, September 1, 2002
    With Automation, you can control all aspects of Excel, Word, or other programs that provide this feature, from Visual FoxPro. In this article, I will concentrate on Excel. Its purpose is to provide a starting point, especially for people new to automation. Introduction With automation, you bas...
    Hilmar Zonneveld, March 1, 2003
    Introduction One common task in programming is to keep track of what problems are pending. For this purpose, I use a "hierarchical to-do list": a list of items, each of which can have sub-items. All you need is Microsoft Word. Alternatives are available as freeware or shareware, but in t...
    Hilmar Zonneveld, October 7, 2005
    This is a step-by-step tutorial to show inheritance, specifically in Visual FoxPro forms, as a guidance for people who are not familiar with inheritance in general, or who don’t know how to implement it in Visual FoxPro. The basic idea of inheritance is that all your forms, or several of your for...
    Hilmar Zonneveld, May 30, 2004
    The code shows how to quickly obtain the greatest common factor, and the least common multiple. Both functions are used when manipulating fractions, among others. Several methods are possible; the method usually taught in school involves prime numbers, but this code will execute much faster (and it ...
    Hilmar Zonneveld, November 1, 2006
    A standard requirement in a production system, or in systems for cost calculation, is to add up all the raw materials for a number of finished articles, to get the total cost, or simply to purchase the materials. In this article, Hilmar outlines how to do this with multiple levels of intermediate ar...
    Hilmar Zonneveld, August 1, 2002
    Overview The purpose of this article is to give an overview of normalization. Basically, normalization refers to having an efficient table structure. I will not discuss the famous "first to fifth normal forms" - if you want that information, enough texts exist about it in other places (search sit...
    Hilmar Zonneveld, November 8, 2001
    The following function will open any document, with its default association (the same application that will be called when you double-click on the file, in Windows Explorer). Use it to open a text-file, a Word or Excel document, an image, etc., with an external application.
    Hilmar Zonneveld, January 1, 2003
    Continuing my series of introductory articles, this article presents an introduction of a simple yet powerful programming concept: recursion. Introduction "To understand recursion, you must first understand recursion." "To make yogurt, you need milk and yogurt." If you are not accustomed...
    Hilmar Zonneveld, December 1, 2002
    Introduction This article presents an introduction to coding shortcuts in Visual FoxPro - when to use them, and when not to. Notes on coding in general This article is about coding shortcuts; however, I should first emphasize that making the code as small as possible is usually not the number...
    Hilmar Zonneveld, July 20, 2001
    Rushmore Optimization can help make queries much faster. However, "Full Rushmore Optimization" is not always a desirable goal. "Partial Optimization" is sometimes much faster. It is often believed that to speed things up, you need to have as many indices as possible. This article explains that so...
    Hilmar Zonneveld, June 7, 2002
    If you need to check elapsed time with seconds() or a datetime value, this function allows you to display the elapsed time in a human-readable format, that is, hours:minutes:seconds, instead of the total number of seconds. Just pass a number of seconds as a parameter.
    Hilmar Zonneveld, April 1, 2002
    SQL is a standard language used to manipulate databases. Several of the SQL commands are integrated into the Visual FoxPro language. Select This is a very flexible command, used to select data from a table, or from several tables. This command has options to get totals from several record...
    Hilmar Zonneveld, August 1, 2003
    In this article, I will show several ways to manipulate text-files. Knowledge of these methods is often important to import and export specific formats. Some of the techniques can also be used to work with files of any content; however, this article will concentrate on text-files. When ...
    Hilmar Zonneveld, June 1, 2002
    The purpose of this article is to show how to use some aspects provided by the Visual FoxPro database engine, to control our data. Indices Perhaps most readers already know indices; anyway, I find it convenient to include a brief summary of the topic, since this is a requisite to understan...
    Hilmar Zonneveld, November 1, 2002
    A help file can be used either for interactive help, or as an online manual. In this article, I will give an overview over creating help files in the new help format (CHM), for Visual FoxPro 6 and later. This article is introductory and assumes no prior knowledge of the Help Compiler, or of HTML cod...
    Hilmar Zonneveld, February 1, 2003
    Introduction Any real-world application will sooner or later misbehave. It is important to be able to find those problems. Visual FoxPro's built-in debugger can help a lot to find out why your program doesn't work as you thought it would. Most of the material in this article applies to Visual...
    Hilmar Zonneveld, May 1, 2006
    This article is an introduction to VisioModeler. This is a free CASE tool, that can help you design your database, in the process sharing the information with the client in a visual, easy-to-understand, format.