Surrogate vs natural vs artificial, clustered vs non

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Surrogate vs natural vs artificial, clustered vs non

Message

From

09/09/2014 06:34:22

Walter Meester
HoogkarspelNetherlands

06/09/2014 08:16:23

Dragan Nedeljkovich (Online)
Now officially retired
Zrenjanin, Serbia

General information

Forum:

Microsoft SQL Server

Category:

Indexing

Title:

Re: Surrogate vs natural vs artificial, clustered vs non

Environment versions

SQL Server:

SQL Server 2014

Application:

Web

Miscellaneous

Thread ID:

01603549

Message ID:

01607206

Views:

>>>>>I also have to admit to a change I needed to make. I used to accept the default of clustering on the primary key. It seems the only need for a clustered key is when you absolutely need a set of records returned in a cluster, such as all the lineitems for a given invoice. The invoice foreign key should be clustered on the line items table, not the primary key on either invoice nor lineitems. Right?
>>>>
>>>>That is better if you're doing a lot of (demanding) queries on them... correct. There might be nuances to this rule, but in general I'd say that is correct.
>>>
>>>I'd rather have these clustered by item code. By invoice, you generally need only when you're totaling and printing it, which is just about once or twice per document. Calculating and reporting, which may be done monthly and/or annually or even several times a day (some apps require recalculating stock on every change) requires access to records of the same item code.
>>>
>>>IOW, yes, cluster on the most frequently used order - just saying that the document number is not the most likely candidate for it.
>>
>>Indeed, it depends on the queries you want to optimize the most. Whether that is an invoiceno, invoicedate, itemcode or something else, is a decision that has to be made by the developer or DBA
>
>And I'd like to be more precise in what I meant, why document number is a bad choice. You need it only when editing/saving a document, and when printing it out. And that's when you don't really need the speed. Even the slowest method will still be faster than the printer and far faster than the data entry. And whatever slowness there may be, it's experienced by one user, the one manipulating the document, and they are well aware that saving it may take some time. Even office type software may take its time to save one document, even when it's a single file (well, seemingly - it's about 20 xml files in subfolders, zipped). However, about every user working with line items will like to have the line items available immediately - be it just viewing the history of an item (purchases, sales, stock) or waiting to pull the totals for a period (which used to be immediate with dbfs 20 years ago - don't see why it wouldn't be today).
>
>The definition of "most frequently used order" needs also to take not only the number of times it is used, but also the number of records read, IOW the total I/O.

Choosing a clustered index should always be done with the motivation to save I/O in the overall scheme of things. You'll have to determine where you need the optimal performance, and indeed that seldomly is at retrieving data for a single invoice / document but almost always in datamining functions. IOW, if you've got queries that perform a lot of I/O you might want to increase the performance in making the parts that retrieve ranges of values be able to use a clustered index.

Map

View

Click here to load this message in the networking platform