Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Group by clause in vfp8
Message
De
21/04/2003 04:37:46
Walter Meester
HoogkarspelPays-Bas
 
Information générale
Forum:
Visual FoxPro
Catégorie:
Autre
Divers
Thread ID:
00774269
Message ID:
00779698
Vues:
30
George,

First of all, to me the discussion is scoped whether its advisable to use SET ENGINEBEHEVIOUR 70 for backward compatibility of the GROUP BY behaviour. My comments focus on the significant of your performance argument. My standpoint is that the performance difference is insignificant.

>>On my P4 2Ghz, it shows that the seconds query is about 10% (0.541 sec and 0.592 sec) slower than the first one. Measurable ? yes. Significant? I don´t think so

>This is bad science. You've only included one field in your test. You don't consider the implications of more than one. In my situation there are multiple fields. You may come to the conclusion that, under your test that the difference isn't significant. What you don't consider is multiple calls to the function(s) along with multiple aggregates.

You're saying ?? Look at your own example where you were comparing statements that were interpretated for each record while in a SQL statement they're intepretated only ones. The example above is giving an idea of the overhead of one MAX statement. You're smart enough to draw conclusions for multiple ones. If you've something to add, then add it, don't complain about bad science, Just adjust the sample and give me the results.

>You consider the best case, I the worst.

The sample above is considering the worst case for one MAX function, since the I/O for data retrival is minimized (Data is in memory, for at least a great deal). This is hardly a production SQL example where data is unbuffered, tables might not be open yet, the query contains no joins, and the application is operating on a network.

>>Is this a representive test ? No, because it´s a very simple query containing no joins and the data comes from memory (cursor) instead of a table stored on disk or a newwork drive.

>Who brought up joins? You're changing the subject.

I did in my example a couple of messages ago. We are talking about performance of SQL SELECT statements in general. Since joins are more used than not in a SQL statments it is fair to include them in at least some remarks of my findings.

>>When applying this same test where the data is stored on a network drive and is used by more than one user (so no read buffering occurs on the client), I found the following that the difference dropped to less than 5 percent on 100 Mbs full duplex network. When doing this same test on a 10 MB full duplex network, the difference where below 1 percent and almost not measurable anymore.

>Again, you're considering a "best case", not worst. However, the point isn't whether or not there's additional overhead. There is, and you admit it. The point is, why bother to introduce it at all?

Again the example above is considering the worst case for one MAX() function. The best case is when data retrieval I/O is far exceeding the MAX() function overhead which is the case in Complex SQL statements on large tables with a small result set (before grouping) without rushmore optimization operation in a slow WAN.

>You don't know what, in the future, changes might be required and neither do I.

Huh ? what do you mean by that? If the VFP team implements a better query optimizer so it can determine the MAX value out of index information, your statement will not hold any water. The SQL MAX() function is a very simple function and could be optimized even further as opposed to the current VFP implementation. So the difference could be even smaller than this.

>>Given the first test on my 2Ghz machine, I see an overhead of the MAX() function of about 0.05 seconds per 300.000 selected records. This time is nothing compared to the additional time needed for doing joins or processing where clauses and processing the actual query. In the test I did also disregard the time need to open tables addressed in the Query.

>Again you're attempting to complicate the argument by introducing elements that weren't involved in the original question. Further, the time needed to open tables in a VFP environment should be reasonably consistent. Neither has anything to do with my original response.

I did not reply on the original question. I did reply on your standpoint of using set enginebehaviour in general. It's my take that for compatibility reasons it might be better not to use SET ENGINEBEHAVIOUR 70. I tried to explain why your counter argument about performance is bogus.

>>WHAT THE HELL ARE WE TALKING ABOUT !?!!

>You've consistently responded to my posts and consistently tried to shift the argument when you've been unable to debunk my statements. You tell me what we're talking about, then ask me what we are.

PEANUTS: we are talking about peanuts.

Given the scope of the discussion, I've not shifted the argument. I tried to explain why in a typical production environment the overhead of the MAX() function is insignificant. A SQL command is a very complex command where performance is determined by numerous factors (which you seem to ignore completely). The problem with these discussions is that many here draw conclusions of one example in one case, while in the real world it is not that simple. Many people (like you) blindly ignore the fact that there are many, many things involved in performance issues.

The same happened about three years ago, when I successfully debunked the argument that a “INDEX ON deleted() TAG Del” , helped performance for about every SQL SELECT query. Various MVPs did participate in this thread, but all had to admit I was right (only Christof was brave enough to do so, though) (And YES this thread was before the Advisor article came out).

In this discussion SQL statements where done on local tables which were buffered in memory, run repeatedly. This is not a typical production environment. When you do such tests, make sure.

1. That Queries on tables that in production environment are used by other users on the network, also operates on tables use by other users. Also make sure you make sure another user makes some changes to these tables to let the NOS disable the read buffering
2. Test this on different networks if possible (full duplex, halfduplex, 100 Mbs, 10 Mbs, coax). It might give you hunches where possible performance problems lie.
3. If in a production environment a query uses an unopened table, be sure your tests do the same

These rules are so simple, but yet they’re violated so often, that they only can lead to wrong conclusions. In the past I’ve seen dozens of performance proof violating these rules.

>You haven't demonstrated this. I never brought up the subject of the last physical record (you did).

No you did, In your first reply to me you said: First, you obscure the intended output which refers to the last physical row. I addmitted this argument.

>I pointed you to a post of mine that thoroughly gave an example and the reason for my original response.

Huh ? what did I demonstrate in the previous message? I showed you the overhead of one MAX() function in a cursor, an unbuffered open table over a free full duplex 100Mbs network, and a full duplex 10Mbs network.

>Walter, I don't know what your problem is, and further, I don't care. I don't know if you've got some sort of problem with me, or problem admitting when you're wrong. Just do me the favor of not trying to prove yourself right by trying to convulute the orginal post and response.

I did admit two points in my second post to you and I did admit there is some overhead, so I definitely don't have a problem with admitting of being wrong. I don't have any problem with you George, you seem to have some problem with me. In my eyes, your persistence in addressing something as a problem (overhead of the MAX() function) which in real life is not a problem at all, is strange.

I probably have more experience in performance tuning and testing than anyone else here on the forum. I do know what is involved in executing SQL statements. Regarding to the MAX() function there might be exceptional cases where it the performance difference is measurable. In general there is virtually none. There are too many other factors in a SQL statement playing a far more important role in determining the SQL SELECT performance:

A. Opening tables
B. Reading tables and indexes (e.g. over a network)
C. Having (and not having) the proper indexes.
D. Performing joins and where clauses
E. grouping and Ordering rows
Etc..

In general when optimizing SQL commands the functions in the field list are of least importance (unless you executing very complex expressions or UDFs). In 99% of the cases the problems lies in the join or where expressions or the (mis)use of indexes.

Oh, yeah. If having a performance problem in a SQL statement like the discussed one, you’d better try to split them up into two SQL statements (or use the solution Mike Y for SQL server) to tune performance. This is far more productive than to argue about performance issues when using the MAX() function in the field expressions list of a SQL query.

Walter,
Précédent
Suivant
Répondre
Fil
Voir

Click here to load this message in the networking platform