Surrogate vs natural vs artificial, clustered vs non

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Surrogate vs natural vs artificial, clustered vs non

Message

From

03/09/2014 15:24:54

Mike Yearwood
Toronto, Ontario, Canada

11/07/2014 15:55:32

Walter Meester
HoogkarspelNetherlands

General information

Forum:

Microsoft SQL Server

Category:

Indexing

Title:

Re: Surrogate vs natural vs artificial, clustered vs non

Environment versions

SQL Server:

SQL Server 2014

Application:

Web

Miscellaneous

Thread ID:

01603549

Message ID:

01606990

Views:

Hi Walter

How is it I did not reply to this?! Sorry about that!

>>I'm trying to get a deeper understanding of SQL Server's inner workings.
>>
>>First off there is a difference between surrogate keys and artificial keys. A surrogate key or blind key or meaningless is system generated and never ever shown to the user (we developers are not the 'users"). Joe Celko has no problems with those. Nor does C.J. Date.
>>
>>An artificial key - (possibly invoice #) may be system generated but it's not a good relational key. Celko would not approve.
>
>Who accepts Celko's opinion as gospel? Even Codd and Date are not on the same page in this matter. Personally I do not see any problem with a generated meaningful key *as* long as you accept it to be absolutely static.

I'm going to disagree with that only because I'm trying to also build an argument for consistency.

>
>>A natural key is composed of data elements.
>
>Not sure what you mean. A natural key is a key that has meaning in the outside world. Whether it consist out of one or several fields is irrelevant.
>Do you mean a compound key?

Yes. I was thinking of a collection of columns to make a primary key.

>
>>Natural keys take more space in the storage pages.
>
>Eh? Why? A GUID takes up more space than a SSN, passport no, driver license etc.
>Even a compound key could be as small as two bytes per indexrecord.

Yes a compound key could be small. I'm leaning towards a complete consistency.

>
>>Suppose there is a table with a compound primary key and said key is clustered.
>>When I insert a record, I'm taking an educated guess that because of the clustered index, the actual table is re-ordered in some manner if a new record falls between other records.
>
>Correct. Inserting a record that has a clustered key is going to reorder record so that the records physically match the order of the clustered index.
>
>>Same table - now the compound key is not clustered.
>>Inserting the record just updates the index and not the table.
>
>Correct
>
>>Same table with a clustered surrogate key
>>if it is clustered, yet sequential, the table is unlikely to be re-ordered in any way but it could!
>
>Correct.
>
>>Same table with a non clustered surrogate key
>>only the index will be updated
>
>Correct.
>
>>Now - another scenario - I have a clustered meaningless key and a candidate key composed of the same columns as the compound primary key above.
>>The sequential key is updated, the record is added. No reordering of the table. The candidate key is updated.
>
>Correct.
>
>>So - considering all of the above, why would one chose natural keys for a clustered primary key? That seems to me to be worse than any of the other scenarios above.
>
>I guess you are referring to natural compound keys. The main advantage of clustered indexes is that a range of values (if the database itself is not too fragmented) sit close together on the HDD and therefore has a performance gain if you need to retrieve a range of records. This advantage will go away with SSD drives as the seek times are much faster as there are no mechanical parts anymore that bottleneck the seek times
>
>OTOH, Clustered indexes determine the size of all other indexes as they point to a clustered index value. Clustered index on an integer field (4) bytes result in significant smaller index sizes if you do not, or worse, if you have wide clustered indexes. The latter should be avoided at all costs.

There is a negative side effect of indexes using the clustered index to access the record in a query. In other words rather than the index pointing directly at the record, it points at the clustered primary key which points at the record which slows queries, is that right? I have to ask if adding a clustered index is beneficial at all if that is true. Only the query that wanted a set of records that were clustered, i.e. the line items for a particular invoice, would benefit from that index and every other query would suffer.

So, I guess I'm leaning towards a non-clustered sequential meaningless primary key - sequential GUID or some other sequence that will fill the index data pages. That seems like the 98% coverage of scenarios.

Map

View

Click here to load this message in the networking platform