><h2>On Primary Keys, Foreign Keys, and Candidate Indices</h2> > ><p><strong>Summary</strong></p> > ><p>This document explains the meaning of primary key, foreign key >and candidate index in Visual FoxPro. A discussion of natural and >surrogate keys is included, including the advantages of each, as >well as methods for obtaining unique numbers.</p> > ><p> </p> > ><p><strong>The actual document:</strong></p> > ><p>This information was written in reaction to a discussion in >the Universal Thread (<a href="http://www.levelextreme.com">www.levelextreme.com</a>).</p> > ><p>The information is oriented towards Visual FoxPro, although >most of the concepts can be applied to other programming >languages or database servers as well.</p> > ><p>I invite the participants of the Universal Thread to point out >key points missing in my arguments (especially key advantages and >disadvantages of surrogate keys), and to provide a sample >function to get a GUID in Visual FoxPro. The function will be >included with appropriate credits.</p> > ><h2>Primary and Foreign Keys</h2> > ><p>In database theory, a <strong>primary key</strong> is the main >way to identify a table. It is a field, or a combination of >fields, that must be unique for each table. It is usually >convenient for each table - or for most tables, at least - to >have a primary key. In some database management systems this is >an actual requirement.</p> > ><p>As an example, each client can be assigned a number (for >instance, in the United States, the Social Security Number (SSN) >can be used).</p> > ><p>The same information placed in other tables, as a pointer to >the main table, is called a <strong>foreign key</strong>. For >instance, in an invoice table, the same SSN can be used as a >pointer to the client table, that is, to indicate for which >client the invoice was extended.</p> > ><h2>Natural and Surrogate Keys</h2> > ><p>The primary key can be a field, or fields, with information in >the table that is also used for purposes other than the primary >key. This is called a <strong>natural key</strong>, or a <strong>key >with business value</strong>.</p> > ><p>Another alternative - preferred by many programmers, including >myself - is to use, as a primary key, a number generated >automatically. The end-user never gets to see this number. This >number would be called a <strong>surrogate key</strong>, or a <strong>key >without business value</strong>. Some database systems have built-in >mechanisms to generate these numbers; for instance, in Microsoft >Access you only have to specify the field type as "auto-incremented". >Visual FoxPro has no built-in mechanism for this, but it is >relatively easy to achieve the same result. See below, for >details.</p> > ><p>Arguments about this topic can easily be as heated as >arguments about your favorite word processor, or about your >favorite religion. Therefore, although personally I (usually) >favor the surrogate key approach, I will try to be as neutral as >possible, collect information in favor of both approaches, and >let the reader decide what method best suits his needs. (When I >started programming, I used natural keys; eventually, my former >boss, Andrés Frank, convinced me of the advantages of surrogate >keys.)</p> > ><h2>Arguments in favor of Surrogate Keys</h2> > > <li>Simplicity: With natural keys, it is often difficult to > find a field, or a combination of fields, that is unique. > For instance, using the SSN in the example above gives > rise to some questions: what about foreign citizens that > have no SSN? What about duplicate SSNs? Yes, these have > been known to exist. With surrogate keys, it is easy to > assign an "autoincremented" field to each table. > This one, automatically generated and never seen by the > end-user, will naturally be unique. With natural keys, it > is sometimes difficult to find a combination of fields > that is always unique - and more than one field has to be > used in some cases. These fields have to be added as > foreign keys in other tables.</li> > <li>When using surrogate keys, the end-user can change the > key field he sees - like the SSN, or a product code - at > any moment; there is no need to propagate the change > through other tables, since the primary key doesn't > change. There is usually no need to actually change the > primary key, since the end-user never gets to see it.</li> > <li>Size: At first sight, it seems that you need an > additional field. However, this is usually - but not > always - more than offset by savings in the foreign keys > in the other tables. Since the primary key uses only 4 > bytes (for integer fields, in Visual FoxPro), the foreign > key in other tables requires only 4 bytes, too.</li> > <li>Speed: it is faster for Visual FoxPro to join tables (in > a SQL - SELECT statement, for instance) with integer keys, > than with longer fields - or with a combination of fields.</li> > ><h2>Arguments against Surrogate Keys</h2> > > <li>Not all of the advantages mentioned above apply in all > cases. The programmer may find it convenient, for > different reasons, to use longer keys instead of integers > (see below); this invalidates some of the advantages > mentioned. A lookup table may use, for a natural key, a > character code with only one or two bytes - less than the > four an integer field uses. And with Cascading Update > Triggers, it is easy (though not always fast) to cascade > changes made in the primary key, to the foreign keys of > other tables.</li> > <li>The main problem I see with surrogate keys is the > somewhat complex programming, that requires lookups both > ways, especially for data forms. Let's take the case of > the client as an example. In a table that shows invoices, > the user types the SSN, the form has to do a lookup in > the client table to return the internal client ID, and > store it in the invoices table. On the other hand, for > looking at an existing invoice, the form has to take the > internal client ID and do a lookup, in the client table, > for the SSN. Creating form objects to edit data like this > can be fairly complex (fortunately, through inheritance, > a generic solution is possible).</li> > <li>When joining data for reports, sometimes additional > fields have to be retrieved. Example: the SSN. If this is > saved as foreign key in the invoice, it is immediately > available. When using a surrogate key, this field has to > be fetched from the other table.</li> > <li>Even though we now have a unique primary key, we may > still find it convenient to find a field, or a > combination of fields, that uniquely identifies each > record to the end user! See the section on Candidate > Indices, below.</li> > ><h2>Options for generating Unique Numbers</h2> > > <li>The first option that comes to mind, to many people, is > to simply use the record number (recno()) as primary key. > After all, that way we don't need an additional field! > This is not an appropriate solution. The main problem is > that, when we PACK a table, the position of each record > changes.</li> > <li>We can use an integer field, automatically incremented > for each record. This approach is fairly straightforward. > Create a table with fields sequence C(20), nextnum I. > Copy the following function (SerialNumber) into your > database stored procedures. Call the function from the > default value of the primary key field, for instance, > default value of clientid (in client table) = > SerialNumber("client"). This allows for up to 2e9 > different values. Some numbers may be skipped (if the > user creates a record, the number is automatically > assigned; if the user then clicks on "Undo", > the changes are reverted, but the number is already used). > This is not a problem with numbers which the user never > sees - our main worry is that the primary key be unique.</li> > ><pre>FUNCTION SERIALNUMBER(tnSequence) > * Get serial number. Used mainly to generate primary keys. > * The easiest way to achieve this is to call this function from a field's default value. > tnSequence = lower(tnSequence) > local lnSelect > lnSelect = select() > if used("serialnumber") > select serialnumber > else > select 0 > use serialnumber > endif > set order to "sequence" > seek padr(tnSequence, len(sequence)) > if not found() > append blank > replace sequence with tnSequence, nextnum with 1 > endif > local lnReturnValue > if lock() > lnReturnValue = nextnum > replace nextnum with nextnum + 1 > else > lnReturnValue = -1 > endif > unlock > select (lnSelect) > return lnReturnValue >ENDFUNC >>
This function can also be used to generate sequential numbers
>which the user does see, like invoice numbers. In this case, to
>avoid skipping numbers, the function should be called only when
>saving the record, and the record should be saved within a
>transaction. Thus, if the save fails, the changes can be undone.
>In this case, table SerialNumber must be part of the database (that
>is, not a free table), otherwise it can't participate in
>transactions.
Theory: a primary key is the main form of identifying a record.
>Naturally, the field or fields that make up a primary key must be
>unique. However, we may also require other fields, or
>combinations of fields, to be unique. It is for this purpose that
>Visual FoxPro lets us create candidate indices.
>Uniqueness (in primary indices and candidate indices) is enforced
>through the index. In other languages, candidate indices would be
>called "unique indices". In Visual FoxPro, the keyword
>"unique" was used previously (in FoxPro) for another
>purpose. Unique indices are kept for backwards compatibility, but
>apart from that, they have no practical application in Visual
>FoxPro (to obtain all the different values for a field, use a
>SELECT - SQL statement instead, either with the DISTINCT keyword,
>or with GROUP BY).
In each table, there can be only one primary index, but there
>can be several candidate indices. Other than what I said already,
>Visual FoxPro doesn't make much difference between a primary and
>a candidate index.
A technical problem arises, for instance, when using candidate
>indices for a SSN. After deleting a record, it is still in the
>table and participates in uniqueness validation. In other words,
>when erasing a record and then re-using the key value, Visual
>FoxPro will protest about the duplicate value. One solution for
>this is to use a filter on an index. Remember that a filtered
>index can't participate in Rushmore Optimization. This may be
>good, or bad, depending on the circumstances: see my document on
>Rushmore Optimization for details. If Rushmore Optimization is
>desired for a specific field, sometimes it may be necessary to
>define two indices: a filtered one, for the primary or candidate
>index, to ensure uniqueness; and a non-filtered one, of "regular"
>type, for Rushmore Optimization.
Copyright (C) 2001, Hilmar Zonneveld. This text may be copied
>freely.