Hi,
I've ot a 15.6 million row table (play.dbf) - (only 16 small columns) - that I need to do some data analysis on. Problem is that some aggregate functions drive VFP to an absolute standstill - not just slow - but down to nothing! I need help with a strategy to deal with this.
Example: two of the columns are pnumber (patron number) and gamedate. Every patron can have multiple records for each gamedate.
First, I wanted a count of how many records per day existed (this goes over a little less than 2 years) so I ran:
"select gamedate, count(gamedate) from play group by gamedate" -
It was slow compared to what I'm used to, but ran fine and gave me proper results. There were about 600 separate days where play happened - so thus about 600 rows returned. NOW - I want a result with the unique patrons in it - in other words - I wanna see all the different pnumbers in the table? So I try both:
select distinct pnumber from play ..... and .... select pnumber from play group by 1
both drive VFP to a halt! - Waited fifteen minutes and barely go to 1% on the query thermometer! tried it several times and had to force quit each time. As a note, I would expect a result set of about 100,000 rows in this case.
An attempt to do aggregate like "Select pnumber, gamedate, count(*) from play group by 1, 2" never even made it to 1% after fifteen minutes! ARGGHHHH! I've got a lot of these kind of aggregates to do as well as cross tabs and there is no way I can wait twenty or thirty minutes on each one.
Any advice on strategy would be greatly appreciated. (BTW - this is on VFP 6 SP3, Dell PII-450 128kRAM - both fields mentioned above are indexed.)
TIA!
Ken B. Matson
GCom2 Solutions