Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Group by clause in vfp8
Message
De
22/04/2003 03:52:18
Walter Meester
HoogkarspelPays-Bas
 
Information générale
Forum:
Visual FoxPro
Catégorie:
Autre
Divers
Thread ID:
00774269
Message ID:
00780062
Vues:
33
George,

>It may be insignificant in the case you tested. Adding multiple fields requiring the same function call adds additional overhead, which may turn out to be significant. Why add it just to workaround something that can be overcome that doesn't add this overhead.

Because of compatibility with other SQL dialects and the VFP8 standard setting is exactly the reason why.

>My example was intentionally simple in order to define what the problem was and what one solution would be. It was not intended to be a typical example.

Oh... and you require me to have a perfect scientifical right example. Right ....

>I have no intention of giving you an example. My statement was clear. Adding the function call introduces additional and unnecessary overhead. At first, you questioned this, asking if this was fact or conventional wisdom. I stated it was fact, and you yourself proved it. Now you say it's not significant. I did take a look at an existing query that would serve as an example. It would take 17 modifications to add the MIN() function and the processing overhead should increase portionately.

Well, just try it. It won't suprise me there is no performance difference at all if you run the query. See the following extracted from the tests.

PIV 2Ghz
Selected records before grouping: 3000
Delay of one MAX() function: 0.05 seconds per 300000 selected records.
Number of MAX() functions in select

Overhead = 17 * 0,05 * (3000 / 300000) = 0.0085 Seconds.

*g* I'm going to worry about that overhead.
The time needed to execute the rest of the query determines the exact percentage of the overhead.
In the worst case the query execution time is low, so the overhead might be significant in terms of percentages. In the best case where the execution time is high even the overhead is insignificant in all cases.


>What about multiple calls? You prove that there's additional overhead with one call and say it's insignificant. The problem here is that it is the best case you'll encounter.

Again, the overhead I calculated is 0,05 seconds for 300000 records for one MAX() function. So it would be 0.85 for 17 MAX() functions. I'm not sure if this is what your mean by multiple calls though.

>>I did in my example a couple of messages ago. We are talking about performance of SQL SELECT statements in general. Since joins are more used than not in a SQL statments it is fair to include them in at least some remarks of my findings.

>Oh really? Maybe in your work they are, still they're not to the point of what was being discussed. I made three clear statements about your solution. You discredit none of them. The best you can come up with is that the overhead question isn't significant in the simplest of cases.

The point was performance differences in SQL Select statements. This is the scope. Joins fall into this scope, as well as ORDER BY, WHERE clause, RUSHMORE optimization and other things that determine the performance of a SQL statement. You insist to scope this issue to one particular form of the SQL SELECT statement. This is what I call bad science. You must look at the whole picture, not a specific cases.

>>Again the example above is considering the worst case for one MAX() function. The best case is when data retrieval I/O is far exceeding the MAX() function overhead which is the case in Complex SQL statements on large tables with a small result set (before grouping) without rushmore optimization operation in a slow WAN.
>
>And??? There's still additional overhead, that isn't necessary.

In almost all coding there is overhead that is not neccesary. For example a nested IIF() function might be far faster than using a DO CASE or nested IF's but still we don't choose for the nested IIF() for readability.

In other cases we simply don't know what the best performance code is. The question is, does it matter? Only certain parts of an application are subject to tuning. Like you said: If it ain't broke, don't fix it. There is no VFP programmer in the world who tunes all his code in an application.

In this case the MAX() function is neccesary if you want to be your SQL statement to be compatible with other SQL dialects.


>>>>Given the first test on my 2Ghz machine, I see an overhead of the MAX() function of about 0.05 seconds per 300.000 selected records. This time is nothing compared to the additional time needed for doing joins or processing where clauses and processing the actual query. In the test I did also disregard the time need to open tables addressed in the Query.

>>>Again you're attempting to complicate the argument by introducing elements that weren't involved in the original question. Further, the time needed to open tables in a VFP environment should be reasonably consistent. Neither has anything to do with my original response.

You still don't get it. Its not about the original question. It's about your statements. And it might be like overcomplicating the issue to you, but this is the real world. SQL statements are not simple to analyze.

In your statement you were Implying that the MAX() statement's overhead was significant enough not to use it. You were implying that the overhead to be significant for the whole scope of a SQL command. Then you'd better know where you're talking about. Then you must do good science and take all parameters into account to weigh that statement.

You did not do that, nor did you provide any (real world) examples to support the things you imply. As I said in another message. Yes, there is a difference, but its not enough to make a difference.

>Indeed, it might in future cases be better not to change the engine behavior. In some cases it might better to change it. However, you have not demonstrated in away way, shape, manner or form that my statement is bogus, and you yourself have proven that it isn't.

In absolute sense, you're right. There is an overhead. However, you imply something that is not there. There is no performance problem in using MAX() in 99.9% of the cases.


>The best you can offer is an opinion that it isn't significant. That's it an opinion. You're entitled to your's and I to mine. I made a statement, and you can't disprove it.

That opinion is becoming very closely a fact. You made a statement, I gave the scientific data to test the significants of your statement. It's virtually .NULL.


>OK, so first saying that my statement about additional overhead maybe incorrect, then when you find it is correct, call it insignificant. Who's shifting their stance. Not me, but you.

I said Also, I'm not sure about the overhead of having a MAX() function in the clause. Have you done tests to support this claim. If I look at the problem, the MAX() function is not that different from the selecting the last row in a group. Some minor test do not support your claim. Please give me some figures. If the difference is less than say 5%, performance should not be an issue

I did not take a hard statment that there is no overhead. I did say that that it might not make a difference. Please read carefully next time. You're reading things that are not there.

>Further, I do take all these things into account and, in fact, may be far better versed in them than yourself. What you blindly (to use your words) fail to account for is the cumulative result. This is one piece of wholly unnecessaray overhead being added to any existing additional factors. Rather than eliminate it, you would chose to retain it. Who's being blind here, Walter?

Tell me, what you mean by cumlative result ? In my example there are about 1000 groups, so the MAX() function returns 1000 records, Adding more groups don't make a difference at all. If you've got questions about the validity of my tests, than post a modified one to support your claim. It highly unprofessional to question something without anything to support your claim, but only yell about the tests not beeing valid.

>This is totally irrelevant. Oh, and BTW, I don't believe that I participated in that thread.

In so far it's relevant that here nobody did support there claims and were trying to focus on particular cases, while my take on this to look at in a much broader view. You, are also failing in seeing that you statement technically is correct, but has about no value in the real world.

>>I did admit two points in my second post to you and I did admit there is some overhead, so I definitely don't have a problem with admitting of being wrong. I don't have any problem with you George, you seem to have some problem with me. In my eyes, your persistence in addressing something as a problem (overhead of the MAX() function) which in real life is not a problem at all, is strange.
>>
>>I probably have more experience in performance tuning and testing than anyone else here on the forum. I do know what is involved in executing SQL statements. Regarding to the MAX() function there might be exceptional cases where it the performance difference is measurable. In general there is virtually none. There are too many other factors in a SQL statement playing a far more important role in determining the SQL SELECT performance:
>>
>>A. Opening tables
>>B. Reading tables and indexes (e.g. over a network)
>>C. Having (and not having) the proper indexes.
>>D. Performing joins and where clauses
>>E. grouping and Ordering rows
>>Etc..

>And...? Well, I've been dealing with this stuff for 10-12 years. I think I might know a bit about it myself.

If you do, then it suprises me you do such unthoughtfull statements or have problems admitting that its not significant. You'd also know that in a production environment the performance of SQL SELECT commands are very different (slower) than you test on a local machine in exclusive mode. If you're really trying to tune a SQL commands the fieldlist (unless contain very complex functions or UFD) is the place you touch as a last resort or not at all before looking for other solutions like reduncy or splitting up SQL statements.

>So it comes down to, "It's OK to add unnecessary code that adds processing overhead, because if there are performance problems there must be another cause." It may be true that other factors are involved, but why introduce one more potential problem.

I'll explain this once more. The overhead in the FieldExpr list is so minor compared to the rest of the SQL statement, that *IF* performance problems occur they're NOT caused by the fieldexpressionlist (unless they do contain unneccesary fields, UDFS or other very complex expressions).

Walter,
Précédent
Suivant
Répondre
Fil
Voir

Click here to load this message in the networking platform