SQL - really slow. How can I speed it up-indexing? - Level Extreme

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

SQL - really slow. How can I speed it up-indexing?

Message

De

15/02/2004 09:54:42

Walter Meester
HoogkarspelPays-Bas

À

13/02/2004 14:50:57

Mike Yearwood
Toronto, Ontario, Canada

Information générale

Forum:

Visual FoxPro

Catégorie:

Base de données, Tables, Vues, Index et syntaxe SQL

Titre:

Re: SQL - really slow. How can I speed it up-indexing?

Divers

Thread ID:

00876858

Message ID:

00877481

Vues:

40

Hi Mike,

>I agree it all depends. There are no hard and fast rules when working in an environment without any standards whatsoever. I have built systems with both approaches, because I want my systems to be as good as they can be. I've tried to switch between the two designs in the last few years, largely based on your input. I've read many if not all of your posts against integer keys and tried to prove to myself that your position was correct. Sometimes the lone voice is right and too often the lone voice is ignored. But it has been my experience that I've wasted lots of time in trying to prove one way is better than the other. IMO, more often than not, the benefits of integer keys outweigh the drawbacks you point out.

Its important to make a distinction between integer and character and intelligent keys and surrogates. They certainly have an overlap. Although I use integer surrogates in 90% of my tables, it does not mean it is always the most logical choice. There are certainly examples where intelligent keys make more sense. For example If you've got a postal code table used for validating addresses, it does not make much sense to use a surrogate key here in stead use just the postal code itself. When defining ranges, and wanting to apply RI based on values in a child table which value should fall into a range in the parent table, you cannot use a surrogate key for implementing this feature.

But we've had that discussion too many times, so it does not make sense to go over that again and again.

With respect of storing keys as integer or as character, I have to confess that my claim that in some cases a long character key could occupy less space because of compression seems not to be true as it turns out that integer keys use the very same compression technique, and I suspenct that the full 100 bytes needs to be specified once in each indexnode. Also the number of bits to store the width of the key seems to be a factor in this.

The following program:

SET SAFETY OFF
SET COMPATIBLE ON

CREATE TABLE testkey FREE (Key1 C(10), KEY2 I)

FOR nT = 1 TO 100000
	INSERT INTO testkey VALUES (ALLTRIM(STR(nT)), nT)
ENDFOR

INDEX ON key1 TAG key1
USE
? "Char: "+TRANSFORM(FSIZE("Testkey.cdx"))

USE testkey EXCLUSIVE
DELETE TAG key1
INDEX ON key2 TAG key2
USE
? "Integer: "+TRANSFORM(FSIZE("Testkey.cdx"))

The program shows that though the character field is 2.5 times as long(in terms of bytes) as the integer field, it only occupies about 10% more space in the index tag. For a char(20) it about 20% more. Not really numbers I'm impressed by. You've got to realize that when comparing performance of queries in these two approaches there are a number of other issues that likely will make the difference:

1. You might end up with more JOINS to get at the meaningfull key.
2. Your parent table would likely have indexes on both the meaningless PK and the meaningfull CK which likely both are used for optimization.
3. Your parent table contains an extra surrogate field which needs to be transmitted also.

For the child tables, it is definately true that your table is less 'wide' as you'd use integer fields in stead of longer character fields. However, the foreign key column typically contains values that occur multiple times (one to many relation).

Example2:

CREATE TABLE testkey FREE (Key1 C(10), KEY2 I)

FOR nT = 1 TO 100000
	INSERT INTO testkey VALUES (ALLTRIM(STR(nT%100)), nT%100)
ENDFOR

INDEX ON key1 TAG key1
USE
? "Char: "+TRANSFORM(FSIZE("Testkey.cdx"))

USE testkey EXCLUSIVE
DELETE TAG key1
INDEX ON key2 TAG key2
USE
? "Integer: "+TRANSFORM(FSIZE("Testkey.cdx"))

Now you'll see that the indexsizes for both the integer as the character keys have dropped significantly which indeed proves that the indexkeys for both cases are compressed and are depended on the value it contains.

If you don't need the full 10 chars (4 bits) for the key but in stead only 7 (3 bits), you'll see that the indexsize for the character tag will drop to slightly less than the integer key.

It seems that the index tag size higly depends on its contents, this is both true for integer indexes as for character indexes. However it is true that integer indexes never exceed 4 bytes listed in the index tag for each record, while that can be the case for character indexes. With larger character keys the difference is more noticable than with smaller ones. In the latter cases there might not be any significant difference at all.

In your example, your ID field construction is such that it cannot be compressed significantly because it sorts on the first and last 5 bytes and is inefficient from a compression point of view.

I realize that with integer surrogate keys you'll likely use an incrementing field which will be highly compressable whereas with meaningfull character keys the values are more 'distributed' and less compressable.

My conclusion however is that whether to use a intelligent character key as opposed to a surrogate integer key is about of no relevance if the character key is not getting too wide. There are other factors making the difference.

Walter,

>Out of respect for your obvious skills, I just wasted another hour trying to find an example where your statements would be correct. I had to make a meaningful character key on a table that had none. It had to be unique, but with some matches on the leading characters. After building that key, I ran a query to determine if there were many duplicates of the first 5 characters. Out of 37654 records, the query returned 14316 records. So almost 40% of the records have matches on the first 5 bytes.
>
>I built three indexes on a table with 37654 records. One on upper(company_name) the 2nd on integer company_pk and the 3rd on a unique 10 character id field based on the first 5 chars of the company name + alltrim(str(recno())) to three different .CDX files. The company name .CDX file size was 660480 bytes. The company id file size was 318976 bytes and the company pk cdx file size was 165888.
>
>I then took an integer key that represents the company type. The company types are stored in a lookup table. I added a 4 byte field to the company table. I set a relation into the lookup table. I took the first 4 bytes of the lookup description (which is the company type description) and put them into the new field in the company table.
>
>I built an index on the 4 byte character field to a new .CDX file. I also built an index on the already existing integer company type to another new .CDX file. The size of these two files was identical despite heavy duplication of the keys.
>
>It is a real company table, so I'd have to remove information before I could send it to you to confirm. But I would welcome your proof.

Click here to load this message in the networking platform