Introduction
This document explains the meaning of primary key, foreign key and candidate index in Visual FoxPro. A discussion of natural and surrogate keys (keys visible, or not visible, to the end-user) is included, including the advantages of each approach, as well as different methods for obtaining unique values.
This information was written in reaction to frequent discussions, on these topics, in the Universal Thread (www.universalthread.com), and althoughit is oriented towards Visual FoxPro, most of the concepts can be applied to other programming languages or database servers as well.
Primary and Foreign Keys
In database theory, a primary key is the main way to identify a record. It is a field, or a combination of fields (actually, in Visual FoxPro, any expression), that must be unique for each record in the table. While Visual FoxPro doesn't actually require it, it is usually convenient for the developer to define a primary key for most tables.
As an example, each client can be assigned a number (for instance, in the United States, the Social Security Number (SSN) can be used).
The same information placed in other tables, as a link to the main table, is called a foreign key. For instance, in an invoice table, the same SSN can be used as a link to the client table, that is, to indicate for which client the invoice was extended.
Natural and Surrogate Keys
The primary key can be a field, or a combination of fields, with information in the table that is also used for purposes other than the primary key. This is known as a natural key, a key with business value, or an intelligent key.
Another alternative is to use, as a primary key, a number (or other value) generated automatically. The end-user doesn't get to see this key. This kind of key is known as a surrogate key, or a key without business value. Since many programmers use this kind of key, some database systems, like Access or SQL Server, have a built-in mechanism for creating "auto-incremented" values in a field (however, their use is optional - the developer has to decide whether he wants natural, or surrogate keys).
Visual FoxPro has no such built-in mechanism for "auto-incremented" fields, but it is not too difficult to achieve this. A sample function is provided in the appendix.
There are developers that favor the natural keys approach, and developers that favor the surrogate keys approach. It is also possible to use a mixed approach (natural keys for some tables, and surrogate keys for others).
Arguments in favor of Surrogate Keys
Arguments in favor of Natural Keys
The main difficulty I see with surrogate keys is in the complex programming, that requires lookups both ways, especially for data forms. Let's take the case of the client as an example. In a table that shows invoices, the user types the SSN, the form has to do a lookup in the client table to return the internal client ID, and store it in the invoices table. On the other hand, for looking at an existing invoice, the form has to take the internal client ID and do a lookup, in the client table, for the SSN. Creating form objects to edit data like this can be fairly complex (fortunately, through inheritance, if you can solve the problem once, you can easily expand it to have a generic solution).
When joining data for reports, sometimes additional tables have to be joined when using surrogate keys. Take the SSN as an example. If this is saved as foreign key in the invoice, it is immediately available. When using a surrogate key, this field has to be fetched from the client table.
The information in the table is more readable with natural keys (with surrogate keys, a table that has a foreign key has to be joined to the main table, in order to get readable information).
For the same reason, creating a quick report on a single table is often easier with natural keys.
Debugging is simplified, since the developer sees "natural" information in the tables.
With surrogate keys, at least when using the "auto-incremented integer" approach (see below), it sometimes happen that the table that stores the sequences gets out of synch, causing "trigger failed" errors due to duplicate values in the primary key.
The speed argument
At first sight, it would seem that Visual FoxPro would join tables (in a SELECT - SQL statement) faster with integer keys, than with, say, a 10-byte character field (typical for a natural keys, for instance, a product code). This is what I had been told, and I actually believed this for many years. Actual testing I have done recently, however, revealed that two-table joins with 10-byte character fields are actually slightly faster than the equivalent join with integer keys! I assume that the reason is related with built-in data compression in character-type indices. (For composite keys, the developer should use composite indices and composite join expressions, for fastest data access.) I leave it up to the reader to do additional testing for additional speed tests on longer character fields, on composite keys, on other field types (e.g., date, or date-time). Also, the situation can be different in the case of a Database Server, like SQL Server or Oracle (I assume that a smaller key will usually be faster, but this assumption would better be tested - see above!). Since the situation is not so clear-cut here (in favor of one approach, or the other), I decided not to put the speed argument in favor of either approach. Options for generating Unique Numbers
This section applies mainly to those developers who are considering using surrogate keys for some or all of their tables. However, the function SerialNumber() (below) can also be used to generate sequential numbers, visible to the end-user, for document numbers (for instance, invoice numbers).
The first option that comes to mind, to many people, is to simply use the record number (recno()) as primary key. After all, we can save a field this way! It turns out that this solution is completely unacceptable. The main problem is that, when we PACK a table, the position of each record changes. Then, foreign keys in other tables would become invalid.
We can use an integer field, automatically incremented for each record. This approach is fairly straightforward. Create a table with fields (sequence C(20), nextnum I). Create an index on sequence. Copy the function SerialNumber() (in the appendix) into your database stored procedures. Call the function from the default value of the primary key field, for instance, default value of clientid (in client table) = SerialNumber("client"). This allows for up to 2e9 different values. Some numbers may be skipped (if the user creates a record, the number is automatically assigned; if the user then clicks on "Undo", the changes are reverted, but field NextNum, in table SerialNumber, is already incremented)). This is not a problem with numbers which the user never sees - the only requirement being that the primary key be unique. This function can also be used to generate sequential numbers which the user does see, like invoice numbers. In this case, to avoid skipping numbers, the function should be called only when saving the record, and the record should be saved within a transaction. Thus, if the save fails, the changes can be undone. In you use the function this way, table SerialNumber must needs be part of the database (that is, not a free table), otherwise it can't participate in transactions.
Some developers prefer to use GUIDs for primary keys. As far as I could gather, GUID stands for "globally unique identifier". It is a unique value generated by the Windows API. It is extremely unlikely for the same value to be generated again - either on the same computer, or on another computer, anywhere on the world. Because of their larger size (128 bits, or 16 bytes), the size advantage of using surrogate keys doesn't apply when using GUIDs. The reason to use GUIDs is because when several users have to input data, and can't connect to the central network all the time (because of geographical distance), a way has to be found to ensure that primary keys are unique. To assign the value automatically to a field, the function GetGUID(), in the appendix, can be called from the default value of a field.
The same objective, of getting unique keys for geographically remote computers, can also be obtained assigning a unique number to each user. Let's say this unique user ID is stored in variable gnUserId. Then, either the primary key can be made up of two fields (in the client example: field client, and field UserId), or the information can be combined in a single field. This latter choice makes it easier to join tables, for referential integrity, or for a query or view. The following expression might be placed in the default value of field ClientId: bintoc(gnUserId, 2) + bintoc(SerialNumber("Client")). Note that for this example, the primary key - and the corresponding foreign keys - should be defined as C(6). The way I wrote the expression above allows for up to 32,000 different users (or offices). Function bintoc() requires VFP 5 or later.
Candidate indices and unique values
Theory: a primary key is the main form of identifying a record. Naturally, the field or fields that make up a primary key must be unique. However, we may also require other fields, or combinations of fields, to be unique. It is for this purpose that Visual FoxPro lets us create candidate indices. Uniqueness (in primary indices and candidate values) is enforced through the index. In other database systems, candidate indices would be called "unique indices". In the case of Visual FoxPro, the keyword "unique" was used previously (in FoxPro) for another purpose. Unique indices are kept for backwards compatibility, but apart from that, they no longer have any practical application in Visual FoxPro (to obtain all the different values for a field, use a SELECT - SQL statement instead, either with the DISTINCT keyword, or with GROUP BY).
In each table, there can be only one primary index, but there can be several candidate indices. Other than what I said already, Visual FoxPro doesn't make much difference between a primary and a candidate index.
A technical problem arises, for instance, when using candidate or primary indices for a SSN. After deleting a record, it is still in the table and participates in uniqueness validation. In other words, when erasing a record and then re-using the key value, Visual FoxPro will protest about the duplicate value (as far as I know, this problem applies only to Visual FoxPro - not to other database systems). One solution for this is to use a filter on an index, with the expression "NOT DELETED()". Remember that a filtered index can't participate in Rushmore Optimization. This may be good, or bad, depending on the circumstances: see my FAQ on Rushmore Optimization (in the Universal Thread) for details. If Rushmore Optimization is desired for a specific field, sometimes it may be necessary to define two indices: a filtered one, for the primary or candidate index, to ensure uniqueness; and a non-filtered one, of "regular" type, for Rushmore Optimization.
Appendix: Functions
Fetches the next number in a sequence (you may want to check for other, similar, functions, in the Universal Thread download section).
* Get serial number. Used mainly to generate primary keys. * The easiest way to achieve this is to call this function from a fields * default value. FUNCTION SERIALNUMBER(tnSequence) tnSequence = lower(tnSequence) local lnSelect lnSelect = select() if used("serialnumber") select serialnumber else select 0 use serialnumber endif set order to "sequence" seek padr(tnSequence, len(sequence)) if not found() append blank replace sequence with tnSequence, nextnum with 1 endif local lnReturnValue if lock() lnReturnValue = nextnum replace nextnum with nextnum + 1 else lnReturnValue = -1 endif unlock flush select (lnSelect) return lnReturnValue ENDFUNC
Returns a 16-byte (128-bit) GUID, through the Windows API. For efficiency, the DECLARE statement should be moved to the main program.
PROCEDURE GetGUID DECLARE INTEGER CoCreateGuid ; IN Ole32.dll ; STRING @lcGUIDStruc cStrucGUID=SPACE(16) cGUID=SPACE(80) nSize=40 IF CoCreateGuid(@cStrucGUID) # 0 RETURN "" ENDIF return cStrucGUID ENDPROC