Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Handling large data sets for web apps
Message
De
01/08/2001 09:10:26
 
 
À
31/07/2001 11:11:01
Hilmar Zonneveld
Independent Consultant
Cochabamba, Bolivie
Information générale
Forum:
Visual FoxPro
Catégorie:
West Wind Web Connection
Divers
Thread ID:
00535271
Message ID:
00538257
Vues:
8
>
>I do hope you read my entry in FAQ. Anyway, I will provide additional examples.
>
>Condition to select invoices:
>
>SELECT ... WHERE InvoiceDate >= {^1999-01-01} and ClientId = 500
>
>Let's assume there are 100,000 invoices dated 1999 or later, only 200 records for the specified client, and only 100 records for the specified client AND 1999 or later.
>
>Problem is, for Rushmore Optimization, VFP gets 100,000 index keys (not yet the records!) from the index (over the network, this will be slow). VFP then constructs a list of the corresponding record numbers. VFP also gets 200 index keys for all invoices that correspond to the client.
>
>After combining the information, VFP finally gets the 100 records. Rushmore Optimization has succeeded in letting VFP get only the RECORDS it required - but at a high cost. Getting the required RECORD NUMBERS (or index keys) generated lots of network traffic.
>
>In the example, you don't want to delete the index on InvoiceDate: it can be quite useful, for instance, for an exact match (InvoiceDate = ldDate). You can override this by changint the condition, for instance, InvoiceDate + 1 >= {^1999-01-01} + 1. Since you probably don't have an index on "InvoiceDate + 1", no index will be used.
>
>
>
>Another example with invoices:
>
>SELECT ... WHERE ClientId = 500 and IsPaid
>
>If you happen to have an index on IsPaid, lots of record numbers will be retrieved for this key. In this case, get rid of the index!
>
>
>
>SELECT sum(Invoice) as Total ... WHERE InvoiceDate >= {^1999-01-01}
>
>In this case, the situation is quite different. You need all the records on or after Jan 1, 1999, anyway, so you are better off using Rushmore Optimization for this condition. The big difference is that you actually need all the records that match the condition. In the previous example, only a small subset was required - and the other condition (on ClientId) returned a small subset.
>
>
>I could go on, and on... but I would rather you send me (over the UT, please) a sample query, to see what considerations apply. If I understand correctly, your conditions can change dynamically based on user input, so optimization can be quite complicated.
>
>
>Visual FoxPro Rushmore Optimization is a "rule-based optimization", and the strategy (or rule) is quite simple: use any index to "optimize" the query.
>
>I want to compare my scant knowledge of Oracle DBMS. This server has the option of "rule-based optimization" and of "cost-based optimization". In "rule-based optimization", different priorities apply. For instance, the highest priority is given to a search on the equivalent of recno(), the next-highest, if I remember correctly, goes to a search on a primary key, etc. A total of something like 15 different approaches are used.
>
>In "cost-based optimization", the "cost" refers to the cost (in time) to retrieve the records. This requires an actual statistical analysis of the indices involved. This information has to be updated regularly, or else the DBMS may use an out-dated strategy for retrieving the records.
>
>
>Hilmar.

I tested the results of just two where conditions.
Empty( D40CkNo ) - Actually with my current data set returns all the rows.
Left( D40Acct, 2 ) = "13" - Returns 478 records.

First with indexes on both takes approximately .095.
I tried swapping the conditions and there was no change in speed.

Now after I removed the Index on Empty( D40CkNo ) the speed was .011.

That's a considerable difference.

We are not running across a network, just from the local hard drive but because we have been gradually scaling up, the could be a possibility in the near future.

I think this clearly demonstrates what you are saying. If you have one condition that has a small number of index keys and one that has a large number of index keys, you are better off having just an index on the set that returns the small number.

As you have stated though, the data is changing all the time. Is the penalty that I am getting now about .1 to .2 worth constantly monitoring the data basically doing what Oracle or SQL Server would do - Kind of a crude statistics. I don't know.

As to your comment about Oracle, I'm not sure why you would ever use rule-based optimization unless the overhead for keeping statistics outweighs the benefits.
Précédent
Suivant
Répondre
Fil
Voir

Click here to load this message in the networking platform