Natural Keys - Level Extreme

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

Natural Keys

Message

10/03/2015 13:30:36

Walter Meester
HoogkarspelPays-Bas

10/03/2015 09:03:02

Mike Yearwood
Toronto, Ontario, Canada

Information générale

Forum:

Microsoft SQL Server

Catégorie:

Conception bases de données

Titre:

Re: Natural Keys

Versions des environnements

SQL Server:

SQL Server 2012

Application:

Web

Divers

Thread ID:

01616073

Message ID:

01616598

Vues:

>>>>>>>>Here at my job we have a database modeler in our group. He is insistent that all tables use 'Natural Keys' and not surrogate keys. I am not trying to start a battle or anything, but is this really even still a debate? It does not matter how much logical reason I provide him, he is propagating his plan across the company and it does not seem to matter what the impact will be. This is a global company in 140 countries with data centers all over the world.
>>>>>>>>
>>>>>>>>My only questions is: Has something changed and I missed it? We are talking about values that users see and will want to change being used as primary keys on the tables.
>>>>>>>
>>>>>>>There are a few things that need clarification.
>>>>>>>
>>>>>>>Natural Keys
>>>>>>>Surrogate (generated) keys
>>>>>>>
>>>>>>>vs
>>>>>>>
>>>>>>>Meaningfull (intelligent) keys
>>>>>>>Meaningless keys
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>The problem in the discussion is that definitions are not clear. Natural keys are keys that already exist in the real world. So for example a SSN or Passport No.
>>>>>>>The consensus is that the use of those keys is not reccomended.
>>>>>>>
>>>>>>>Surrogate keys are keys that are generated from the computer system. It could be a number from a sequence or GUID.
>>>>>>>
>>>>>>>However, a surrogate key might be an intelligent key. An example is an invoice number. An invoice number is an example of a key that (by law) should be absolutely static. There is little reason not to use an invoiceno as a key, as it is absolutely static by definition. Also note that the meaning of the key originates from the computer system, and not the outside world, even if it has created a meaning there.
>>>>>>>
>>>>>>>Another misconception is that surrogate keys should always be invisible for the user. A mutation, or payment might have a key that is generated within the computer system, but is visible in the GUI to be able to use that for pusposes outside of the system. (e.g a reference number in correspondence with customers). I've worked with Navision a decade and a half ago (now a Microsoft product) in which this practise is common).
>>>>>>>
>>>>>>>
>>>>>>>Personally, I use integer keys only as it simplifies audit trails and other metadata throughout the database.
>>>>>>
>>>>>>Excellent points Walter. I would have no issues with meaningful keys such as invoice numbers. I do tend to avoid using integers but primarily because I work in Enterprise systems and have been seen this bite the budget terribly. We had to dump a very large project to separate a large enterprise system into 3 different regional systems due to the use of sequenced integers for everything. There were already billions of rows in many tables and the cost to break this up was prohibitive. The integer keys was the breaking point. I would not have an issue using integers for lookup tables or smaller systems, but who knows sometimes where it will go. I see little harm or downside in using GUIDs for my efforts, so tend to stick to that.
>>>>>
>>>>>Same here. There is a lot of mis-communication and misunderstanding in IT. Many people go by Joe Celko's stuff, but even so, he's human and things are open to interpretation. This article is a good way to determine what kind of key:
>>>>>
>>>>>http://www.informationweek.com/software/information-management/celko-on-sql-natural-artificial-and-surrogate-keys-explained/d/d-id/1059246?
>>>>>
>>>>>First off, let's note that he does not say surrogate keys are bad. Artificial keys are bad.
>>>>>
>>>>>In "ACM Transactions on Database Systems," Dr. Codd wrote that "…database users may cause the system to generate or delete a surrogate, but they have no control over its value, nor is its value ever displayed to them...
>>>>>
>>>>>This means that a surrogate ought to act like an index, hash table, bit vector or whatever; created by the user, managed by the system and NEVER seen by a user. That means never used in queries, DRI or anything else that a user does."
>>>>>
>>>>>I disagree with that last part. A surrogate key is a key. The user is not supposed to see it. The system is supposed to manage it. So if the user initiates a query, I see nothing wrong whatsoever with the system executing that query joining tables on the surrogate key, or the system managing referential integrity with that surrogate key.
>>>>>
>>>>>I would not use the invoice number as the primary key. There is a huge benefit to doing every table and key the same way. That benefit is "practice". You should not have to decide per table what kind of key. That, to me, is like a drywaller choosing between screws, glue, tape, per joint.
>>>>>
>>>>>If I have a 10 digit customer number in the customer table, and a 10 digit invoice number and a 10 digit line item number, to track payments against line items, what I have a 30 digit key on the line item? Everybody knows keys must be short! So how can anyone argue in favor of natural keys?
>>>>>
>>>>>Space is cheap, ram is cheap. This continual debating over primary keys is SOOOOO WASTEFUL to the entire planet!!!!
>>>>
>>>>Unfortunately, there might the an order of magnitude of difference in performance between integer and GUID keys. Do not underestimate the effect that large and wide keys in your query performance. Space and Ram might be cheap, but CPU and (disk) I/O are not and that is exactly where you'd loose the performance.
>>>>
>>>>Your database might triple or more in size if you change your integer keys to GUIDs and drag down a 0.01 second query to a 1s query in some case. That is not a price that I'm willing to take unless there are very good reasons for it.
>>>>
>>>>Walter,
>>>
>>>cpu speeds are always improving with more and more cores, so they are cheap too.
>>
>>Not true in my book, the speed gains on single threaded operations have slowed down significantly in recent years. And lots of query processing is bound to a single core. Paralellisation comes with overhead and costs and most of the time indicate suboptimal query performance.
>>
>>>Lots of people say don't worry about things like mdot, which does affect performance, but then how can a natural key make sense? There is no way a natural key is going to be as small as an integer, true.
>>
>>An invoice number, could be as small as an integer. You constantly ignore the fact that wider keys can affect performance one or two orders of magnitudes, while with mdot we are talking single digit percentage performance differences, low two's a most percentage wise. That is the huge difference.
>
>Yes, an invoice number can be small. But using natural keys means that invoice number becomes part of the child key of another table. That new key becomes much wider.
>
>
>>
>>>It will likely result in a wider key only a few joins down. It can also easily be wider than a GUID. A key that is 12 bytes bigger than an integer is not that big a hit, especially if said keys are the new sequential guids.
>>
>>Again, I'll hold this is simply not true. It is a big hit if your keys are 4 times as large than necessary. We've got an audit table in our product that would explode from 20Gb to a 60Gb. The whole database would go from a 40Gb to a whopping 120Gb. Backing up, restoring, moving database and in memory caching become much more painful. Running it on a laptop with 120 Gb is out of the question. I got only a 128Gb SSD on mine. Queries are slower to execute because of the large volume that has to be processed and worse getting it from disk and back.
>
>Some of the tables would get larger. Most would not triple in size. Where is your test data? That's all that should be on your laptop. Leave the large data on a server for debugging.
>
>>
>>
>>Then there is the problem of debugging and ad-hoc querying that becomes very painful with all those 16byte human unreadable long keys.
>
>No one is supposed to be querying the meaningless keys!!!! That's the whole point of meaningless keys. You query on the meaningful data and only do joins.

I guess you do not do a lot of query debugging on SQL server. There is no escape from having to know and enter a FK or PK when you're tracing problems in your queries. And that becomes very annoying and cumbersome with tables that contain large GUID, just writing it down on a piece of paper, using that value to check something else in another table becomes a nightmare.

>>One might be comfortable with GUIDs, but simply stating that the effect on performance is insignificant is simply not true. It can be VERY significant on large data.
>
>It can be - but by using them alone, guess what? You'll be forced to figure out the real business problems and not waste time on keys.

What real business problem? The only argument I can imagine has to do with offline data creation or merging databases. And in those cases there are other solutions to GUIDs. And in all of those cases GUIDs are not the silver magical bullet either. Yes, your entered data creates unique keys, but that can also bite you in cases where two sites create a record for the same entity but with two different keys. Whether you are using integers or GUIDs, you'll have to take care of that problem.

Répondre

Fil

Voir

Click here to load this message in the networking platform