Surrogate vs natural vs artificial, clustered vs non

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

Surrogate vs natural vs artificial, clustered vs non

Message

04/09/2014 09:55:56

Mike Yearwood
Toronto, Ontario, Canada

04/09/2014 07:38:55

Walter Meester
HoogkarspelPays-Bas

Information générale

Forum:

Microsoft SQL Server

Catégorie:

Indexation

Titre:

Re: Surrogate vs natural vs artificial, clustered vs non

Versions des environnements

SQL Server:

SQL Server 2014

Application:

Web

Divers

Thread ID:

01603549

Message ID:

01607042

Vues:

>>>>First off there is a difference between surrogate keys and artificial keys. A surrogate key or blind key or meaningless is system generated and never ever shown to the user (we developers are not the 'users"). Joe Celko has no problems with those. Nor does C.J. Date.
>>>>
>>>>An artificial key - (possibly invoice #) may be system generated but it's not a good relational key. Celko would not approve.
>>>
>>>Who accepts Celko's opinion as gospel? Even Codd and Date are not on the same page in this matter. Personally I do not see any problem with a generated meaningful key *as* long as you accept it to be absolutely static.
>
>>I'm going to disagree with that only because I'm trying to also build an argument for consistency.
>
>Well. If we look at one of our own systems, each and every table has a surrogate integer key. That has left us with a couple of advantages.
>- A primary key generator could be used for all tables in the system
>- DataTables and other metadata could refer to records in the system of all tables, by a combination of tablename and pk. This has proven very usefull to us in a number of scenarios
>- Our cursoradapters are generated by metadata and rely on only one primary key. Compound keys are out of the picture.
>- Our audit trail relies on an integer primary key that is absolutely static
>
>So Yes, I can agree upon to a point, but again I also do feel that an absoluut meaningfull key in itself is not bad either. However the problem is that often you think it is absolutely static until a new requirement comes up that decides otherwise.

A GUID is something no one would edit. An integer primary key could be. I think the meaningless key is the way to go. Meaningful could get out of whack with the data if the data changes. Integer keys could become used as an invoice number or something. It must not be shown to the user.

>
>
>>>OTOH, Clustered indexes determine the size of all other indexes as they point to a clustered index value. Clustered index on an integer field (4) bytes result in significant smaller index sizes if you do not, or worse, if you have wide clustered indexes. The latter should be avoided at all costs.
>>
>>There is a negative side effect of indexes using the clustered index to access the record in a query. In other words rather than the index pointing directly at the record, it points at the clustered primary key which points at the record which slows queries, is that right? I have to ask if adding a clustered index is beneficial at all if that is true. Only the query that wanted a set of records that were clustered, i.e. the line items for a particular invoice, would benefit from that index and every other query would suffer.
>
>Its an extra hop to find the record, but its effect normally should not be that great. Other things, like joins, retrieving data from disk usually take much more time than going through an extra level to get the record pointer. Its benefit is that an clustered index is ideal to retrieve records on ranges on values, because their are physically close on the HDD. SSD technology reduces that benefit though.

Good.
>
>
>>So, I guess I'm leaning towards a non-clustered sequential meaningless primary key - sequential GUID or some other sequence that will fill the index data pages. That seems like the 98% coverage of scenarios.
>
>I'd take an non-clustered sequential integer or bigint as the PK index. If I was able to rewrite all applications I created, I'd do that. Clustered index could be bery usefull on certain queries, but I'd be very carefull when applying that.

Even integers can be generated twice, so I never understood the argument against GUIDs might be generated twice. The primary key option would prevent the duplicate key in both cases, and you'd have to generate another one and re-attempt the insert. A difference which makes no difference is no difference.

Thanks for the discussion! :)

Répondre

Fil

Voir

Click here to load this message in the networking platform