Handling large data sets for web apps

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

Handling large data sets for web apps

Message

01/08/2001 09:10:26

Dan Cummings
ICL Systems Inc.
Californie, États-Unis

31/07/2001 11:11:01

Hilmar Zonneveld
Independent Consultant
Cochabamba, Bolivie

Information générale

Forum:

Visual FoxPro

Catégorie:

West Wind Web Connection

Titre:

Re: Handling large data sets for web apps

Divers

Thread ID:

00535271

Message ID:

00538257

Vues:

>
>I do hope you read my entry in FAQ. Anyway, I will provide additional examples.
>
>Condition to select invoices:
>
>SELECT ... WHERE InvoiceDate >= {^1999-01-01} and ClientId = 500
>
>Let's assume there are 100,000 invoices dated 1999 or later, only 200 records for the specified client, and only 100 records for the specified client AND 1999 or later.
>
>Problem is, for Rushmore Optimization, VFP gets 100,000 index keys (not yet the records!) from the index (over the network, this will be slow). VFP then constructs a list of the corresponding record numbers. VFP also gets 200 index keys for all invoices that correspond to the client.
>
>After combining the information, VFP finally gets the 100 records. Rushmore Optimization has succeeded in letting VFP get only the RECORDS it required - but at a high cost. Getting the required RECORD NUMBERS (or index keys) generated lots of network traffic.
>
>In the example, you don't want to delete the index on InvoiceDate: it can be quite useful, for instance, for an exact match (InvoiceDate = ldDate). You can override this by changint the condition, for instance, InvoiceDate + 1 >= {^1999-01-01} + 1. Since you probably don't have an index on "InvoiceDate + 1", no index will be used.
>
>
>
>Another example with invoices:
>
>SELECT ... WHERE ClientId = 500 and IsPaid
>
>If you happen to have an index on IsPaid, lots of record numbers will be retrieved for this key. In this case, get rid of the index!
>
>
>
>SELECT sum(Invoice) as Total ... WHERE InvoiceDate >= {^1999-01-01}
>
>In this case, the situation is quite different. You need all the records on or after Jan 1, 1999, anyway, so you are better off using Rushmore Optimization for this condition. The big difference is that you actually need all the records that match the condition. In the previous example, only a small subset was required - and the other condition (on ClientId) returned a small subset.
>
>
>I could go on, and on... but I would rather you send me (over the UT, please) a sample query, to see what considerations apply. If I understand correctly, your conditions can change dynamically based on user input, so optimization can be quite complicated.
>
>
>Visual FoxPro Rushmore Optimization is a "rule-based optimization", and the strategy (or rule) is quite simple: use any index to "optimize" the query.
>
>I want to compare my scant knowledge of Oracle DBMS. This server has the option of "rule-based optimization" and of "cost-based optimization". In "rule-based optimization", different priorities apply. For instance, the highest priority is given to a search on the equivalent of recno(), the next-highest, if I remember correctly, goes to a search on a primary key, etc. A total of something like 15 different approaches are used.
>
>In "cost-based optimization", the "cost" refers to the cost (in time) to retrieve the records. This requires an actual statistical analysis of the indices involved. This information has to be updated regularly, or else the DBMS may use an out-dated strategy for retrieving the records.
>
>
>Hilmar.

I tested the results of just two where conditions.
Empty( D40CkNo ) - Actually with my current data set returns all the rows.
Left( D40Acct, 2 ) = "13" - Returns 478 records.

First with indexes on both takes approximately .095.
I tried swapping the conditions and there was no change in speed.

Now after I removed the Index on Empty( D40CkNo ) the speed was .011.

That's a considerable difference.

We are not running across a network, just from the local hard drive but because we have been gradually scaling up, the could be a possibility in the near future.

I think this clearly demonstrates what you are saying. If you have one condition that has a small number of index keys and one that has a large number of index keys, you are better off having just an index on the set that returns the small number.

As you have stated though, the data is changing all the time. Is the penalty that I am getting now about .1 to .2 worth constantly monitoring the data basically doing what Oracle or SQL Server would do - Kind of a crude statistics. I don't know.

As to your comment about Oracle, I'm not sure why you would ever use rule-based optimization unless the overhead for keeping statistics outweighs the benefits.

Répondre

Fil

Voir

Click here to load this message in the networking platform