Group by clause in vfp8 - Level Extreme

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

Group by clause in vfp8

Message

21/04/2003 09:50:53

George Tasker
Shaw Contract Group
Cartersville, Georgie, États-Unis

21/04/2003 04:37:46

Walter Meester
HoogkarspelPays-Bas

Information générale

Forum:

Visual FoxPro

Catégorie:

Autre

Titre:

Re: Group by clause in vfp8

Divers

Thread ID:

00774269

Message ID:

00779758

Vues:

>George,
>
>First of all, to me the discussion is scoped whether its advisable to use SET ENGINEBEHEVIOUR 70 for backward compatibility of the GROUP BY behaviour. My comments focus on the significant of your performance argument. My standpoint is that the performance difference is insignificant.

It may be insignificant in the case you tested. Adding multiple fields requiring the same function call adds additional overhead, which may turn out to be significant. Why add it just to workaround something that can be overcome that doesn't add this overhead.

>>>On my P4 2Ghz, it shows that the seconds query is about 10% (0.541 sec and 0.592 sec) slower than the first one. Measurable ? yes. Significant? I don´t think so
>
>>This is bad science. You've only included one field in your test. You don't consider the implications of more than one. In my situation there are multiple fields. You may come to the conclusion that, under your test that the difference isn't significant. What you don't consider is multiple calls to the function(s) along with multiple aggregates.
>
>You're saying ?? Look at your own example where you were comparing statements that were interpretated for each record while in a SQL statement they're intepretated only ones. The example above is giving an idea of the overhead of one MAX statement. You're smart enough to draw conclusions for multiple ones. If you've something to add, then add it, don't complain about bad science, Just adjust the sample and give me the results.

My example was intentionally simple in order to define what the problem was and what one solution would be. It was not intended to be a typical example.

I have no intention of giving you an example. My statement was clear. Adding the function call introduces additional and unnecessary overhead. At first, you questioned this, asking if this was fact or conventional wisdom. I stated it was fact, and you yourself proved it. Now you say it's not significant. I did take a look at an existing query that would serve as an example. It would take 17 modifications to add the MIN() function and the processing overhead should increase portionately.

>>You consider the best case, I the worst.
>
>The sample above is considering the worst case for one MAX function, since the I/O for data retrival is minimized (Data is in memory, for at least a great deal). This is hardly a production SQL example where data is unbuffered, tables might not be open yet, the query contains no joins, and the application is operating on a network.

What about multiple calls? You prove that there's additional overhead with one call and say it's insignificant. The problem here is that it is the best case you'll encounter.

>>>Is this a representive test ? No, because it´s a very simple query containing no joins and the data comes from memory (cursor) instead of a table stored on disk or a newwork drive.
>
>>Who brought up joins? You're changing the subject.
>
>I did in my example a couple of messages ago. We are talking about performance of SQL SELECT statements in general. Since joins are more used than not in a SQL statments it is fair to include them in at least some remarks of my findings.

Oh really? Maybe in your work they are, still they're not to the point of what was being discussed. I made three clear statements about your solution. You discredit none of them. The best you can come up with is that the overhead question isn't significant in the simplest of cases.

>>>When applying this same test where the data is stored on a network drive and is used by more than one user (so no read buffering occurs on the client), I found the following that the difference dropped to less than 5 percent on 100 Mbs full duplex network. When doing this same test on a 10 MB full duplex network, the difference where below 1 percent and almost not measurable anymore.
>
>>Again, you're considering a "best case", not worst. However, the point isn't whether or not there's additional overhead. There is, and you admit it. The point is, why bother to introduce it at all?
>
>Again the example above is considering the worst case for one MAX() function. The best case is when data retrieval I/O is far exceeding the MAX() function overhead which is the case in Complex SQL statements on large tables with a small result set (before grouping) without rushmore optimization operation in a slow WAN.

And??? There's still additional overhead, that isn't necessary.

>>You don't know what, in the future, changes might be required and neither do I.
>
>Huh ? what do you mean by that? If the VFP team implements a better query optimizer so it can determine the MAX value out of index information, your statement will not hold any water. The SQL MAX() function is a very simple function and could be optimized even further as opposed to the current VFP implementation. So the difference could be even smaller than this.

Sure it holds water...unless of course someone has changed the rules of cardinality without telling me. Last I checked, zero is still less than one or more. Zero is the number of comparisions required by not including the function call. One or more is the number required with the function call.

>>>Given the first test on my 2Ghz machine, I see an overhead of the MAX() function of about 0.05 seconds per 300.000 selected records. This time is nothing compared to the additional time needed for doing joins or processing where clauses and processing the actual query. In the test I did also disregard the time need to open tables addressed in the Query.
>
>>Again you're attempting to complicate the argument by introducing elements that weren't involved in the original question. Further, the time needed to open tables in a VFP environment should be reasonably consistent. Neither has anything to do with my original response.
>
>I did not reply on the original question. I did reply on your standpoint of using set enginebehaviour in general. It's my take that for compatibility reasons it might be better not to use SET ENGINEBEHAVIOUR 70. I tried to explain why your counter argument about performance is bogus.

Indeed, it might in future cases be better not to change the engine behavior. In some cases it might better to change it. However, you have not demonstrated in away way, shape, manner or form that my statement is bogus, and you yourself have proven that it isn't. The best you can offer is an opinion that it isn't significant. That's it an opinion. You're entitled to your's and I to mine. I made a statement, and you can't disprove it.

>>>WHAT THE HELL ARE WE TALKING ABOUT !?!!
>
>>You've consistently responded to my posts and consistently tried to shift the argument when you've been unable to debunk my statements. You tell me what we're talking about, then ask me what we are.
>
>PEANUTS: we are talking about peanuts.

Opinion, not fact. Opinion.

>Given the scope of the discussion, I've not shifted the argument. I tried to explain why in a typical production environment the overhead of the MAX() function is insignificant. A SQL command is a very complex command where performance is determined by numerous factors (which you seem to ignore completely). The problem with these discussions is that many here draw conclusions of one example in one case, while in the real world it is not that simple. Many people (like you) blindly ignore the fact that there are many, many things involved in performance issues.

OK, so first saying that my statement about additional overhead maybe incorrect, then when you find it is correct, call it insignificant. Who's shifting their stance. Not me, but you.

Further, I do take all these things into account and, in fact, may be far better versed in them than yourself. What you blindly (to use your words) fail to account for is the cumulative result. This is one piece of wholly unnecessaray overhead being added to any existing additional factors. Rather than eliminate it, you would chose to retain it. Who's being blind here, Walter?

>The same happened about three years ago, when I successfully debunked the argument that a “INDEX ON deleted() TAG Del” , helped performance for about every SQL SELECT query. Various MVPs did participate in this thread, but all had to admit I was right (only Christof was brave enough to do so, though) (And YES this thread was before the Advisor article came out).

This is totally irrelevant. Oh, and BTW, I don't believe that I participated in that thread.

>In this discussion SQL statements where done on local tables which were buffered in memory, run repeatedly. This is not a typical production environment. When you do such tests, make sure.
>
>1. That Queries on tables that in production environment are used by other users on the network, also operates on tables use by other users. Also make sure you make sure another user makes some changes to these tables to let the NOS disable the read buffering
>2. Test this on different networks if possible (full duplex, halfduplex, 100 Mbs, 10 Mbs, coax). It might give you hunches where possible performance problems lie.
>3. If in a production environment a query uses an unopened table, be sure your tests do the same
>
>These rules are so simple, but yet they’re violated so often, that they only can lead to wrong conclusions. In the past I’ve seen dozens of performance proof violating these rules.

You may be correct. I can't speak for anyone but myself.

>>You haven't demonstrated this. I never brought up the subject of the last physical record (you did).
>
>No you did, In your first reply to me you said: First, you obscure the intended output which refers to the last physical row. I addmitted this argument.
>
>>I pointed you to a post of mine that thoroughly gave an example and the reason for my original response.
>
>Huh ? what did I demonstrate in the previous message? I showed you the overhead of one MAX() function in a cursor, an unbuffered open table over a free full duplex 100Mbs network, and a full duplex 10Mbs network.
>
>>Walter, I don't know what your problem is, and further, I don't care. I don't know if you've got some sort of problem with me, or problem admitting when you're wrong. Just do me the favor of not trying to prove yourself right by trying to convulute the orginal post and response.
>
>I did admit two points in my second post to you and I did admit there is some overhead, so I definitely don't have a problem with admitting of being wrong. I don't have any problem with you George, you seem to have some problem with me. In my eyes, your persistence in addressing something as a problem (overhead of the MAX() function) which in real life is not a problem at all, is strange.
>
>I probably have more experience in performance tuning and testing than anyone else here on the forum. I do know what is involved in executing SQL statements. Regarding to the MAX() function there might be exceptional cases where it the performance difference is measurable. In general there is virtually none. There are too many other factors in a SQL statement playing a far more important role in determining the SQL SELECT performance:
>
>A. Opening tables
>B. Reading tables and indexes (e.g. over a network)
>C. Having (and not having) the proper indexes.
>D. Performing joins and where clauses
>E. grouping and Ordering rows
>Etc..

And...? Well, I've been dealing with this stuff for 10-12 years. I think I might know a bit about it myself.

>In general when optimizing SQL commands the functions in the field list are of least importance (unless you executing very complex expressions or UDFs). In 99% of the cases the problems lies in the join or where expressions or the (mis)use of indexes.
>
>Oh, yeah. If having a performance problem in a SQL statement like the discussed one, you’d better try to split them up into two SQL statements (or use the solution Mike Y for SQL server) to tune performance. This is far more productive than to argue about performance issues when using the MAX() function in the field expressions list of a SQL query.
>
So it comes down to, "It's OK to add unnecessary code that adds processing overhead, because if there are performance problems there must be another cause." It may be true that other factors are involved, but why introduce one more potential problem.

George

Ubi caritas et amor, deus ibi est

Répondre

Fil

Voir

Click here to load this message in the networking platform