Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Primary Keys Document
Message
 
 
À
04/08/2001 20:10:36
Hilmar Zonneveld
Independent Consultant
Cochabamba, Bolivie
Information générale
Forum:
Visual FoxPro
Catégorie:
Base de données, Tables, Vues, Index et syntaxe SQL
Divers
Thread ID:
00539860
Message ID:
00539943
Vues:
19
>Following the thread on candidate keys, where surrogate vs. natural keys were discussed, I am planning to place, in FAQ, a document explaining some of the relevant points. I invite all members of UT to provide feedback. My main interest is a) whether I missed some important argument for, or against, surrogate keys, b) a sample function to obtain a GUID.
>
>Regards, Hilmar.
>
>Proposed text for FAQ follows; formatting will be reviewed later (I copied HTML code from FrontPage Express; UT adds additional paragraph marks):
>
SSN is not a very good example of PK/FK, since there are cases of repeated SSN.
I haven't read this text throughly yet, but this is the first observation.
>
><h2>On Primary Keys, Foreign Keys, and Candidate Indices</h2>
>
><p><strong>Summary</strong></p>
>
><p>This document explains the meaning of primary key, foreign key
>and candidate index in Visual FoxPro. A discussion of natural and
>surrogate keys is included, including the advantages of each, as
>well as methods for obtaining unique numbers.</p>
>
><p> </p>
>
><p><strong>The actual document:</strong></p>
>
><p>This information was written in reaction to a discussion in
>the Universal Thread (<a href="http://www.levelextreme.com">www.levelextreme.com</a>).</p>
>
><p>The information is oriented towards Visual FoxPro, although
>most of the concepts can be applied to other programming
>languages or database servers as well.</p>
>
><p>I invite the participants of the Universal Thread to point out
>key points missing in my arguments (especially key advantages and
>disadvantages of surrogate keys), and to provide a sample
>function to get a GUID in Visual FoxPro. The function will be
>included with appropriate credits.</p>
>
><h2>Primary and Foreign Keys</h2>
>
><p>In database theory, a <strong>primary key</strong> is the main
>way to identify a table. It is a field, or a combination of
>fields, that must be unique for each table. It is usually
>convenient for each table - or for most tables, at least - to
>have a primary key. In some database management systems this is
>an actual requirement.</p>
>
><p>As an example, each client can be assigned a number (for
>instance, in the United States, the Social Security Number (SSN)
>can be used).</p>
>
><p>The same information placed in other tables, as a pointer to
>the main table, is called a <strong>foreign key</strong>. For
>instance, in an invoice table, the same SSN can be used as a
>pointer to the client table, that is, to indicate for which
>client the invoice was extended.</p>
>
><h2>Natural and Surrogate Keys</h2>
>
><p>The primary key can be a field, or fields, with information in
>the table that is also used for purposes other than the primary
>key. This is called a <strong>natural key</strong>, or a <strong>key
>with business value</strong>.</p>
>
><p>Another alternative - preferred by many programmers, including
>myself - is to use, as a primary key, a number generated
>automatically. The end-user never gets to see this number. This
>number would be called a <strong>surrogate key</strong>, or a <strong>key
>without business value</strong>. Some database systems have built-in
>mechanisms to generate these numbers; for instance, in Microsoft
>Access you only have to specify the field type as "auto-incremented".
>Visual FoxPro has no built-in mechanism for this, but it is
>relatively easy to achieve the same result. See below, for
>details.</p>
>
><p>Arguments about this topic can easily be as heated as
>arguments about your favorite word processor, or about your
>favorite religion. Therefore, although personally I (usually)
>favor the surrogate key approach, I will try to be as neutral as
>possible, collect information in favor of both approaches, and
>let the reader decide what method best suits his needs. (When I
>started programming, I used natural keys; eventually, my former
>boss, Andrés Frank, convinced me of the advantages of surrogate
>keys.)</p>
>
><h2>Arguments in favor of Surrogate Keys</h2>
>
>    <li>Simplicity: With natural keys, it is often difficult to
>        find a field, or a combination of fields, that is unique.
>        For instance, using the SSN in the example above gives
>        rise to some questions: what about foreign citizens that
>        have no SSN? What about duplicate SSNs? Yes, these have
>        been known to exist. With surrogate keys, it is easy to
>        assign an "autoincremented" field to each table.
>        This one, automatically generated and never seen by the
>        end-user, will naturally be unique. With natural keys, it
>        is sometimes difficult to find a combination of fields
>        that is always unique - and more than one field has to be
>        used in some cases. These fields have to be added as
>        foreign keys in other tables.</li>
>    <li>When using surrogate keys, the end-user can change the
>        key field he sees - like the SSN, or a product code - at
>        any moment; there is no need to propagate the change
>        through other tables, since the primary key doesn't
>        change. There is usually no need to actually change the
>        primary key, since the end-user never gets to see it.</li>
>    <li>Size: At first sight, it seems that you need an
>        additional field. However, this is usually - but not
>        always - more than offset by savings in the foreign keys
>        in the other tables. Since the primary key uses only 4
>        bytes (for integer fields, in Visual FoxPro), the foreign
>        key in other tables requires only 4 bytes, too.</li>
>    <li>Speed: it is faster for Visual FoxPro to join tables (in
>        a SQL - SELECT statement, for instance) with integer keys,
>        than with longer fields - or with a combination of fields.</li>
>
><h2>Arguments against Surrogate Keys</h2>
>
>    <li>Not all of the advantages mentioned above apply in all
>        cases. The programmer may find it convenient, for
>        different reasons, to use longer keys instead of integers
>        (see below); this invalidates some of the advantages
>        mentioned. A lookup table may use, for a natural key, a
>        character code with only one or two bytes - less than the
>        four an integer field uses. And with Cascading Update
>        Triggers, it is easy (though not always fast) to cascade
>        changes made in the primary key, to the foreign keys of
>        other tables.</li>
>    <li>The main problem I see with surrogate keys is the
>        somewhat complex programming, that requires lookups both
>        ways, especially for data forms. Let's take the case of
>        the client as an example. In a table that shows invoices,
>        the user types the SSN, the form has to do a lookup in
>        the client table to return the internal client ID, and
>        store it in the invoices table. On the other hand, for
>        looking at an existing invoice, the form has to take the
>        internal client ID and do a lookup, in the client table,
>        for the SSN. Creating form objects to edit data like this
>        can be fairly complex (fortunately, through inheritance,
>        a generic solution is possible).</li>
>    <li>When joining data for reports, sometimes additional
>        fields have to be retrieved. Example: the SSN. If this is
>        saved as foreign key in the invoice, it is immediately
>        available. When using a surrogate key, this field has to
>        be fetched from the other table.</li>
>    <li>Even though we now have a unique primary key, we may
>        still find it convenient to find a field, or a
>        combination of fields, that uniquely identifies each
>        record to the end user! See the section on Candidate
>        Indices, below.</li>
>
><h2>Options for generating Unique Numbers</h2>
>
>    <li>The first option that comes to mind, to many people, is
>        to simply use the record number (recno()) as primary key.
>        After all, that way we don't need an additional field!
>        This is not an appropriate solution. The main problem is
>        that, when we PACK a table, the position of each record
>        changes.</li>
>    <li>We can use an integer field, automatically incremented
>        for each record. This approach is fairly straightforward.
>        Create a table with fields sequence C(20), nextnum I.
>        Copy the following function (SerialNumber) into your
>        database stored procedures. Call the function from the
>        default value of the primary key field, for instance,
>        default value of clientid (in client table) =
>        SerialNumber("client"). This allows for up to 2e9
>        different values. Some numbers may be skipped (if the
>        user creates a record, the number is automatically
>        assigned; if the user then clicks on "Undo",
>        the changes are reverted, but the number is already used).
>        This is not a problem with numbers which the user never
>        sees - our main worry is that the primary key be unique.</li>
>
><pre>FUNCTION SERIALNUMBER(tnSequence)
>	* Get serial number. Used mainly to generate primary keys.
>	* The easiest way to achieve this is to call this function from a field's default value.
>	tnSequence = lower(tnSequence)
>	local lnSelect
>	lnSelect = select()
>	if used("serialnumber")
>		select serialnumber
>	else
>		select 0
>		use serialnumber
>	endif
>	set order to "sequence"
>	seek padr(tnSequence, len(sequence))
>	if not found()
>		append blank
>		replace sequence with tnSequence, nextnum with 1
>	endif
>	local lnReturnValue
>	if lock()
>		lnReturnValue = nextnum
>		replace nextnum with nextnum + 1
>	else
>		lnReturnValue = -1
>	endif
>	unlock
>	select (lnSelect)
>	return lnReturnValue
>ENDFUNC
>
>
>

This function can also be used to generate sequential numbers
>which the user does see, like invoice numbers. In this case, to
>avoid skipping numbers, the function should be called only when
>saving the record, and the record should be saved within a
>transaction. Thus, if the save fails, the changes can be undone.
>In this case, table SerialNumber must be part of the database (that
>is, not a free table), otherwise it can't participate in
>transactions.


>
>
  • Some programmers prefer to use GUIDs for primary keys. As
    > far as I could gather, GUID stands for "global
    > unique identifier". It is a unique value generated
    > by the Windows API. Actually, it is more correct to say
    > it is assumed to be unique: because of the fact that it
    > is created randomly (as far as I know), and because of
    > its size (128 bits, or 16 bytes), it is highly unlikely
    > that the same value is ever used again - either on the
    > same machine, or on another one (there are 3.4e38
    > different 128-bit values). GUIDs are fairly large,
    > therefore some of the advantages of using surrogate keys
    > (size, speed) don't apply when using them.

  • >
  • The reason to use GUIDs is because when several users
    > have to input data, and can't connect to the central
    > network all the time (because of geographical distance),
    > a method must be used to ensure that primary keys are
    > unique. This can also be done by assigning a unique
    > number to each user; let's say this unique user ID is
    > stored in variable gnUserId. Then, either the primary key
    > can be made up of two fields (in the client example:
    > field client, and field UserId), or the information can
    > be combined in a single field. This latter choice makes
    > it easier to join tables, for referential integrity, or
    > for a query or view. The following expression might be
    > placed in the default value of field ClientId: bintoc(gnUserId,
    > 2) + bintoc(SerialNumber("Client")). Note that
    > for this example, the primary key - and the corresponding
    > foreign keys - should be defined as C(6). This allows for
    > up to 65,000 different users (or offices). Function
    > bintoc() requires VFP 5 or later.

  • >
    >

    Candidate indices and unique values


    >
    >

    Theory: a primary key is the main form of identifying a record.
    >Naturally, the field or fields that make up a primary key must be
    >unique. However, we may also require other fields, or
    >combinations of fields, to be unique. It is for this purpose that
    >Visual FoxPro lets us create candidate indices.
    >Uniqueness (in primary indices and candidate indices) is enforced
    >through the index. In other languages, candidate indices would be
    >called "unique indices". In Visual FoxPro, the keyword
    >"unique" was used previously (in FoxPro) for another
    >purpose. Unique indices are kept for backwards compatibility, but
    >apart from that, they have no practical application in Visual
    >FoxPro (to obtain all the different values for a field, use a
    >SELECT - SQL statement instead, either with the DISTINCT keyword,
    >or with GROUP BY).


    >
    >

    In each table, there can be only one primary index, but there
    >can be several candidate indices. Other than what I said already,
    >Visual FoxPro doesn't make much difference between a primary and
    >a candidate index.


    >
    >

    A technical problem arises, for instance, when using candidate
    >indices for a SSN. After deleting a record, it is still in the
    >table and participates in uniqueness validation. In other words,
    >when erasing a record and then re-using the key value, Visual
    >FoxPro will protest about the duplicate value. One solution for
    >this is to use a filter on an index. Remember that a filtered
    >index can't participate in Rushmore Optimization. This may be
    >good, or bad, depending on the circumstances: see my document on
    >Rushmore Optimization for details. If Rushmore Optimization is
    >desired for a specific field, sometimes it may be necessary to
    >define two indices: a filtered one, for the primary or candidate
    >index, to ensure uniqueness; and a non-filtered one, of "regular"
    >type, for Rushmore Optimization.


    >
    >

     


    >
    >

    Copyright (C) 2001, Hilmar Zonneveld. This text may be copied
    >freely.

    If it's not broken, fix it until it is.


    My Blog
    Précédent
    Suivant
    Répondre
    Fil
    Voir

    Click here to load this message in the networking platform