Surrogate vs natural vs artificial, clustered vs non

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Surrogate vs natural vs artificial, clustered vs non

Message

From

04/09/2014 07:38:55

Walter Meester
HoogkarspelNetherlands

03/09/2014 15:24:54

Mike Yearwood
Toronto, Ontario, Canada

General information

Forum:

Microsoft SQL Server

Category:

Indexing

Title:

Re: Surrogate vs natural vs artificial, clustered vs non

Environment versions

SQL Server:

SQL Server 2014

Application:

Web

Miscellaneous

Thread ID:

01603549

Message ID:

01607028

Views:

>>>First off there is a difference between surrogate keys and artificial keys. A surrogate key or blind key or meaningless is system generated and never ever shown to the user (we developers are not the 'users"). Joe Celko has no problems with those. Nor does C.J. Date.
>>>
>>>An artificial key - (possibly invoice #) may be system generated but it's not a good relational key. Celko would not approve.
>>
>>Who accepts Celko's opinion as gospel? Even Codd and Date are not on the same page in this matter. Personally I do not see any problem with a generated meaningful key *as* long as you accept it to be absolutely static.

>I'm going to disagree with that only because I'm trying to also build an argument for consistency.

Well. If we look at one of our own systems, each and every table has a surrogate integer key. That has left us with a couple of advantages.
- A primary key generator could be used for all tables in the system
- DataTables and other metadata could refer to records in the system of all tables, by a combination of tablename and pk. This has proven very usefull to us in a number of scenarios
- Our cursoradapters are generated by metadata and rely on only one primary key. Compound keys are out of the picture.
- Our audit trail relies on an integer primary key that is absolutely static

So Yes, I can agree upon to a point, but again I also do feel that an absoluut meaningfull key in itself is not bad either. However the problem is that often you think it is absolutely static until a new requirement comes up that decides otherwise.

>>OTOH, Clustered indexes determine the size of all other indexes as they point to a clustered index value. Clustered index on an integer field (4) bytes result in significant smaller index sizes if you do not, or worse, if you have wide clustered indexes. The latter should be avoided at all costs.
>
>There is a negative side effect of indexes using the clustered index to access the record in a query. In other words rather than the index pointing directly at the record, it points at the clustered primary key which points at the record which slows queries, is that right? I have to ask if adding a clustered index is beneficial at all if that is true. Only the query that wanted a set of records that were clustered, i.e. the line items for a particular invoice, would benefit from that index and every other query would suffer.

Its an extra hop to find the record, but its effect normally should not be that great. Other things, like joins, retrieving data from disk usually take much more time than going through an extra level to get the record pointer. Its benefit is that an clustered index is ideal to retrieve records on ranges on values, because their are physically close on the HDD. SSD technology reduces that benefit though.

>So, I guess I'm leaning towards a non-clustered sequential meaningless primary key - sequential GUID or some other sequence that will fill the index data pages. That seems like the 98% coverage of scenarios.

I'd take an non-clustered sequential integer or bigint as the PK index. If I was able to rewrite all applications I created, I'd do that. Clustered index could be bery usefull on certain queries, but I'd be very carefull when applying that.

Map

View

Click here to load this message in the networking platform