Handling large data sets for web apps

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Handling large data sets for web apps

Message

From

01/08/2001 09:16:02

Dan Cummings
ICL Systems Inc.
California, United States

31/07/2001 11:30:47

Hilmar Zonneveld
Independent Consultant
Cochabamba, Bolivia

General information

Forum:

Visual FoxPro

Category:

West Wind Web Connection

Title:

Re: Handling large data sets for web apps

Miscellaneous

Thread ID:

00535271

Message ID:

00538260

Views:

>>Are you saying to do a count for each condition of the filter for one field. Based on those conditions build the SQL so that the filter which returns the smallest number of records comes first in the where clause and so on so that the one that gets the highest number is last?
>
>I don't think this will work correctly. The COUNT itself will be slow if a lot of records match a certain part of a condition. You should rather "guess" beforehand which conditions should be included in Rushmore Optimization, and which should be bypassed.
>
>Examples:
>
>InvoiceType = "D" (where InvoiceType can only have a few values, therefore, lots of records for each possible value): delete index from the start. Its use will usually be counterproductive.
>
>
>InvoiceDate = ldDate: Probably only a few records will match this part. Keep the index.
>
>
>InvoiceDate > ldFromDate: Probably a lot of records will match this part. Try to bypass RO (Rushmore Optimization) for this part, changing the condition to something like InvoiceDate + 0 > ldFromDate
>
>In general, bypass RO for any part of a condition that uses >, <, # and <>, >=, <=, between(). Don't bypass it if the condition is used by itself. Don't bypass it for =, ==, inlist(). Bypass for "=", if it is part of a larger condition, compares a string, only matches a single character of a larger string, and SET EXACT OFF.
>
>
>Obviously, you will have to do some trade-offs to keep your algorithm simple. The main idea, I understand, is to have MOST queries within a reasonable time.
>
>Hilmar.

Yes, I agree that in most cases the COUNT is too slow for web pages. But there might be a scenario where this might be useful. For one I could write my own crude statistics program that would give me an picture of what the data looks like on a day to day basis.

There also might be a scenario where the data changes so drastically from day to day and the benefit gained from doing the COUNT and re-ordering the SELECT would outweighs the cost. It seems unlikely though, because even with the 2.5 million records the differnce with all conditions indexed was only .1 or .2.

The COUNT of all the conditions would take much longer and then I would have to remove indexes on the fly. This doesn't take a long time but I don't have exclusive access to the data on the web.

Map

View

Click here to load this message in the networking platform