Handling large data sets for web apps

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

Handling large data sets for web apps

Message

26/07/2001 10:13:08

Jay Van Santen
Indiana, États-Unis

26/07/2001 08:33:12

Jim Nelson
Toronto, Ontario, Canada

Information générale

Forum:

Visual FoxPro

Catégorie:

West Wind Web Connection

Titre:

Re: Handling large data sets for web apps

Divers

Thread ID:

00535271

Message ID:

00535659

Vues:

>Dan and Jay,
>
>I notice that VFP 7 has a "FORCE" clause in the SQL-Select, apparently to let us control, apparently through the order of WHERE clause arguments, the sequence of optimization.
>Just thinking that that might be relevant/useful here.
>
>Cheers,
>
>JimN

Jim --

If it works as advertised, that might allow creating a single query rather than multiple.

Helpful thought.

Jay

>
>>>>>>>>
>>>>>>>>I like the idea but this does require quite a bit of analysis and actually it can change over time. It would be nice to have a process that runs over the weekend that checks to see where we are going to get the most restrictive filters.
>>>>>>>>
>>>>>>>>But ultimately, there are going to be some filters where I am unable to get them a result in a reasonable time. The only way to deal with these kinds of filters that I can think of is to use some kind of asynchronous solution.
>>>>>>>
>>>>>>>I recognize that this is a stopgap measure as you look for a more permanent and robust solution.
>>>>>>>
>>>>>>>But, to the topic at hand.
>>>>>>>
>>>>>>>Are you generating the SELECT statement dynamically or populating values in a hard-coded statement? Dynamically creating SELECT statements will avoid processing filter conditions which aren't used in the present query.
>>>>>>>
>>>>>>>In that light, I don't think that a lot of analysis need be done. It sounds like you're fully optimized. So, without much of an up-front performance penalty you could take each condition, as generated by the user for a specific query, and identify exactly the number of records to be returned for that specific query. Then, you could dynamically generate the 2 SELECT statements, using the filter returning the smallest result set for the 1st.
>>>>>>>
>>>>>>>Pseudo code:
>>>>>>>Have an array of filter conditions, each array element holding all the conditions related to 1 field in the table. The array will have an unpopulated column which would indicate the # of records for that condition.
>>>>>>>
>>>>>>>FOR X = 1 TO nFilterConditions
>>>>>>> SELECT COUNT (*) WHERE &FilterCondition[X, 1] FROM BigTable INTO ARRAY aFC
>>>>>>> aFilterConditions [X, 2] = aFC[1]
>>>>>>>ENDFOR
>>>>>>>
>>>>>>>Find Min
>>>>>>>
>>>>>>>Create SQL
>>>>>>>
>>>>>>>
>>>>>>>Best wishes!
>>>>>>>
>>>>>>> Jay
>>>>>>
>>>>>>I am allowing a starts with, ends with, =, >=, <=, <>, empty, not empty filters. I would have to allow for quite a lot of possibilities. Each query against a huge table could take about 60 seconds. That is not a lot when dealing with one or two possibilities but it seems that the number of possible filter conditions would be pretty high and the it take a long time to process all of these.
>>>>>
>>>>>The reason I suggested that approach is that it sounds like the optimizer is choking. If that's the case, the performance on the "whole enchillada" is not indicative of what you should be getting when you cut that up into smaller pieces.
>>>>>
>>>>
>>>>>If you're fully optimized, any COUNT query on a single field should return a value in milliseconds. For those cases, VFP doesn't even need to look at the data, it just does a count on the index. So, testing the count on the conditions on each field, up to 11, shouldn't take much time at all.
>>>>>
>>>>
>>>>>As for gathering the individual conditions into a macro substitutable form, I guess that depends on how you've architected the app and how much time you want to spend. I did something similar for an end user query front end. It's really not hard, but it would involve a bit of time, agreed.
>>>>>
>>>>
>>>>Are you saying to do a count for each condition of the filter for one field. Based on those conditions build the SQL so that the filter which returns the smallest number of records comes first in the where clause and so on so that the one that gets the highest number is last?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>Best wishes!
>>>>>
>>>>> Jay
>>>
>>>I like the idea, I wonder how this will work in production. My boss wants me to get 99.5% of the hits to about .1 of a second or less. In one test where I counted a result which returns the majority of the 2.5 million records was coming back in .05. I would be over that limit quickly but if we can improve performance dramatically using this method, this would definitely worth looking into.
>>
>>The large data set you're working with does create some minimum overhead, doesn't it? But, the times you're encountering seem to be fairly high for simply a fully optimized SELECT alone. Of course, you've got a complex interplay there, too.
>>
>>I guess I'm not fully convinced that this is a query optimization issue, but I think its an avenue worth pursuing.
>>
>>Do you experience about the same performance on a test machine, or is it orders of magnitude less -- as you identify in the post above? In that case, there may be some constraints you're running into with the apartment model multithreading approach of VFP -- where the specific instance of VFP may have a more constricted amount of memory available to it. Rick Strahl would be a good source of info on any limits in this arena.
>>
>>If your tests are running pretty comparable, I would again review (I'm sure you've done it already many times) the indices to ensure that you're fully optimized on any query. Do all the queries run slowly? Or, the more conditions, the more slowly does it run?
>>
>>And, if you wouldn't mind sharing the method you use to construct the SQL statement, or identify the precise SQL statement you're using, that would also help.
>>
>> Jay

Répondre

Fil

Voir

Click here to load this message in the networking platform