Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Primary Keys Document
Message
De
04/08/2001 20:10:36
Hilmar Zonneveld
Independent Consultant
Cochabamba, Bolivie
 
 
À
Tous
Information générale
Forum:
Visual FoxPro
Catégorie:
Base de données, Tables, Vues, Index et syntaxe SQL
Titre:
Primary Keys Document
Divers
Thread ID:
00539860
Message ID:
00539860
Vues:
53
Following the thread on candidate keys, where surrogate vs. natural keys were discussed, I am planning to place, in FAQ, a document explaining some of the relevant points. I invite all members of UT to provide feedback. My main interest is a) whether I missed some important argument for, or against, surrogate keys, b) a sample function to obtain a GUID.

Regards, Hilmar.

Proposed text for FAQ follows; formatting will be reviewed later (I copied HTML code from FrontPage Express; UT adds additional paragraph marks):


On Primary Keys, Foreign Keys, and Candidate Indices



Summary



This document explains the meaning of primary key, foreign key
and candidate index in Visual FoxPro. A discussion of natural and
surrogate keys is included, including the advantages of each, as
well as methods for obtaining unique numbers.



 



The actual document:



This information was written in reaction to a discussion in
the Universal Thread (www.levelextreme.com).



The information is oriented towards Visual FoxPro, although
most of the concepts can be applied to other programming
languages or database servers as well.



I invite the participants of the Universal Thread to point out
key points missing in my arguments (especially key advantages and
disadvantages of surrogate keys), and to provide a sample
function to get a GUID in Visual FoxPro. The function will be
included with appropriate credits.



Primary and Foreign Keys



In database theory, a primary key is the main
way to identify a table. It is a field, or a combination of
fields, that must be unique for each table. It is usually
convenient for each table - or for most tables, at least - to
have a primary key. In some database management systems this is
an actual requirement.



As an example, each client can be assigned a number (for
instance, in the United States, the Social Security Number (SSN)
can be used).



The same information placed in other tables, as a pointer to
the main table, is called a foreign key. For
instance, in an invoice table, the same SSN can be used as a
pointer to the client table, that is, to indicate for which
client the invoice was extended.



Natural and Surrogate Keys



The primary key can be a field, or fields, with information in
the table that is also used for purposes other than the primary
key. This is called a natural key, or a key
with business value
.



Another alternative - preferred by many programmers, including
myself - is to use, as a primary key, a number generated
automatically. The end-user never gets to see this number. This
number would be called a surrogate key, or a key
without business value
. Some database systems have built-in
mechanisms to generate these numbers; for instance, in Microsoft
Access you only have to specify the field type as "auto-incremented".
Visual FoxPro has no built-in mechanism for this, but it is
relatively easy to achieve the same result. See below, for
details.



Arguments about this topic can easily be as heated as
arguments about your favorite word processor, or about your
favorite religion. Therefore, although personally I (usually)
favor the surrogate key approach, I will try to be as neutral as
possible, collect information in favor of both approaches, and
let the reader decide what method best suits his needs. (When I
started programming, I used natural keys; eventually, my former
boss, Andrés Frank, convinced me of the advantages of surrogate
keys.)



Arguments in favor of Surrogate Keys



  • Simplicity: With natural keys, it is often difficult to
    find a field, or a combination of fields, that is unique.
    For instance, using the SSN in the example above gives
    rise to some questions: what about foreign citizens that
    have no SSN? What about duplicate SSNs? Yes, these have
    been known to exist. With surrogate keys, it is easy to
    assign an "autoincremented" field to each table.
    This one, automatically generated and never seen by the
    end-user, will naturally be unique. With natural keys, it
    is sometimes difficult to find a combination of fields
    that is always unique - and more than one field has to be
    used in some cases. These fields have to be added as
    foreign keys in other tables.

  • When using surrogate keys, the end-user can change the
    key field he sees - like the SSN, or a product code - at
    any moment; there is no need to propagate the change
    through other tables, since the primary key doesn't
    change. There is usually no need to actually change the
    primary key, since the end-user never gets to see it.

  • Size: At first sight, it seems that you need an
    additional field. However, this is usually - but not
    always - more than offset by savings in the foreign keys
    in the other tables. Since the primary key uses only 4
    bytes (for integer fields, in Visual FoxPro), the foreign
    key in other tables requires only 4 bytes, too.

  • Speed: it is faster for Visual FoxPro to join tables (in
    a SQL - SELECT statement, for instance) with integer keys,
    than with longer fields - or with a combination of fields.


  • Arguments against Surrogate Keys



  • Not all of the advantages mentioned above apply in all
    cases. The programmer may find it convenient, for
    different reasons, to use longer keys instead of integers
    (see below); this invalidates some of the advantages
    mentioned. A lookup table may use, for a natural key, a
    character code with only one or two bytes - less than the
    four an integer field uses. And with Cascading Update
    Triggers, it is easy (though not always fast) to cascade
    changes made in the primary key, to the foreign keys of
    other tables.

  • The main problem I see with surrogate keys is the
    somewhat complex programming, that requires lookups both
    ways, especially for data forms. Let's take the case of
    the client as an example. In a table that shows invoices,
    the user types the SSN, the form has to do a lookup in
    the client table to return the internal client ID, and
    store it in the invoices table. On the other hand, for
    looking at an existing invoice, the form has to take the
    internal client ID and do a lookup, in the client table,
    for the SSN. Creating form objects to edit data like this
    can be fairly complex (fortunately, through inheritance,
    a generic solution is possible).

  • When joining data for reports, sometimes additional
    fields have to be retrieved. Example: the SSN. If this is
    saved as foreign key in the invoice, it is immediately
    available. When using a surrogate key, this field has to
    be fetched from the other table.

  • Even though we now have a unique primary key, we may
    still find it convenient to find a field, or a
    combination of fields, that uniquely identifies each
    record to the end user! See the section on Candidate
    Indices, below.


  • Options for generating Unique Numbers



  • The first option that comes to mind, to many people, is
    to simply use the record number (recno()) as primary key.
    After all, that way we don't need an additional field!
    This is not an appropriate solution. The main problem is
    that, when we PACK a table, the position of each record
    changes.

  • We can use an integer field, automatically incremented
    for each record. This approach is fairly straightforward.
    Create a table with fields sequence C(20), nextnum I.
    Copy the following function (SerialNumber) into your
    database stored procedures. Call the function from the
    default value of the primary key field, for instance,
    default value of clientid (in client table) =
    SerialNumber("client"). This allows for up to 2e9
    different values. Some numbers may be skipped (if the
    user creates a record, the number is automatically
    assigned; if the user then clicks on "Undo",
    the changes are reverted, but the number is already used).
    This is not a problem with numbers which the user never
    sees - our main worry is that the primary key be unique.
  • FUNCTION SERIALNUMBER(tnSequence)
    	* Get serial number. Used mainly to generate primary keys.
    	* The easiest way to achieve this is to call this function from a field's default value.
    	tnSequence = lower(tnSequence)
    	local lnSelect
    	lnSelect = select()
    	if used("serialnumber")
    		select serialnumber
    	else
    		select 0
    		use serialnumber
    	endif
    	set order to "sequence"
    	seek padr(tnSequence, len(sequence))
    	if not found()
    		append blank
    		replace sequence with tnSequence, nextnum with 1
    	endif
    	local lnReturnValue
    	if lock()
    		lnReturnValue = nextnum
    		replace nextnum with nextnum + 1
    	else
    		lnReturnValue = -1
    	endif
    	unlock
    	select (lnSelect)
    	return lnReturnValue
    ENDFUNC

    This function can also be used to generate sequential numbers
    which the user does see, like invoice numbers. In this case, to
    avoid skipping numbers, the function should be called only when
    saving the record, and the record should be saved within a
    transaction. Thus, if the save fails, the changes can be undone.
    In this case, table SerialNumber must be part of the database (that
    is, not a free table), otherwise it can't participate in
    transactions.



  • Some programmers prefer to use GUIDs for primary keys. As
    far as I could gather, GUID stands for "global
    unique identifier". It is a unique value generated
    by the Windows API. Actually, it is more correct to say
    it is assumed to be unique: because of the fact that it
    is created randomly (as far as I know), and because of
    its size (128 bits, or 16 bytes), it is highly unlikely
    that the same value is ever used again - either on the
    same machine, or on another one (there are 3.4e38
    different 128-bit values). GUIDs are fairly large,
    therefore some of the advantages of using surrogate keys
    (size, speed) don't apply when using them.

  • The reason to use GUIDs is because when several users
    have to input data, and can't connect to the central
    network all the time (because of geographical distance),
    a method must be used to ensure that primary keys are
    unique. This can also be done by assigning a unique
    number to each user; let's say this unique user ID is
    stored in variable gnUserId. Then, either the primary key
    can be made up of two fields (in the client example:
    field client, and field UserId), or the information can
    be combined in a single field. This latter choice makes
    it easier to join tables, for referential integrity, or
    for a query or view. The following expression might be
    placed in the default value of field ClientId: bintoc(gnUserId,
    2) + bintoc(SerialNumber("Client")). Note that
    for this example, the primary key - and the corresponding
    foreign keys - should be defined as C(6). This allows for
    up to 65,000 different users (or offices). Function
    bintoc() requires VFP 5 or later.


  • Candidate indices and unique values



    Theory: a primary key is the main form of identifying a record.
    Naturally, the field or fields that make up a primary key must be
    unique. However, we may also require other fields, or
    combinations of fields, to be unique. It is for this purpose that
    Visual FoxPro lets us create candidate indices.
    Uniqueness (in primary indices and candidate indices) is enforced
    through the index. In other languages, candidate indices would be
    called "unique indices". In Visual FoxPro, the keyword
    "unique" was used previously (in FoxPro) for another
    purpose. Unique indices are kept for backwards compatibility, but
    apart from that, they have no practical application in Visual
    FoxPro (to obtain all the different values for a field, use a
    SELECT - SQL statement instead, either with the DISTINCT keyword,
    or with GROUP BY).



    In each table, there can be only one primary index, but there
    can be several candidate indices. Other than what I said already,
    Visual FoxPro doesn't make much difference between a primary and
    a candidate index.



    A technical problem arises, for instance, when using candidate
    indices for a SSN. After deleting a record, it is still in the
    table and participates in uniqueness validation. In other words,
    when erasing a record and then re-using the key value, Visual
    FoxPro will protest about the duplicate value. One solution for
    this is to use a filter on an index. Remember that a filtered
    index can't participate in Rushmore Optimization. This may be
    good, or bad, depending on the circumstances: see my document on
    Rushmore Optimization for details. If Rushmore Optimization is
    desired for a specific field, sometimes it may be necessary to
    define two indices: a filtered one, for the primary or candidate
    index, to ensure uniqueness; and a non-filtered one, of "regular"
    type, for Rushmore Optimization.



     



    Copyright (C) 2001, Hilmar Zonneveld. This text may be copied
    freely.

    Difference in opinions hath cost many millions of lives: for instance, whether flesh be bread, or bread be flesh; whether whistling be a vice or a virtue; whether it be better to kiss a post, or throw it into the fire... (from Gulliver's Travels)
    Suivant
    Répondre
    Fil
    Voir

    Click here to load this message in the networking platform