Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
ARGHH! big table killing VFP - need strategy help!
Message
From
10/08/1999 12:06:22
 
 
To
06/08/1999 14:41:56
General information
Forum:
Visual FoxPro
Category:
Other
Miscellaneous
Thread ID:
00250826
Message ID:
00252024
Views:
18
>Hi,
>
>I've ot a 15.6 million row table (play.dbf) - (only 16 small columns) - that I need to do some data analysis on. Problem is that some aggregate functions drive VFP to an absolute standstill - not just slow - but down to nothing! I need help with a strategy to deal with this.
>
>Example: two of the columns are pnumber (patron number) and gamedate. Every patron can have multiple records for each gamedate.
>
>First, I wanted a count of how many records per day existed (this goes over a little less than 2 years) so I ran:
>
>"select gamedate, count(gamedate) from play group by gamedate" -
>
>It was slow compared to what I'm used to, but ran fine and gave me proper results. There were about 600 separate days where play happened - so thus about 600 rows returned. NOW - I want a result with the unique patrons in it - in other words - I wanna see all the different pnumbers in the table? So I try both:
>
>select distinct pnumber from play ..... and .... select pnumber from play group by 1
>
>both drive VFP to a halt! - Waited fifteen minutes and barely go to 1% on the query thermometer! tried it several times and had to force quit each time. As a note, I would expect a result set of about 100,000 rows in this case.
>
>An attempt to do aggregate like "Select pnumber, gamedate, count(*) from play group by 1, 2" never even made it to 1% after fifteen minutes! ARGGHHHH! I've got a lot of these kind of aggregates to do as well as cross tabs and there is no way I can wait twenty or thirty minutes on each one.
>
>Any advice on strategy would be greatly appreciated. (BTW - this is on VFP 6 SP3, Dell PII-450 128kRAM - both fields mentioned above are indexed.)
>
>TIA!

Isn't there a logical way to split up the data more? I would try to categorize your players or perform the query on a subset of data mothly, weekly, daily, etc.

I can't do an effective analysis on those kind of huge aggregate values anyway. There are so many variables contributing that it is hard to glean any meaningful information from such huge data sets. In other words, do you really care what day in the last two years is the most active? Or would it be more helpful to plot the daily or weekly activity of the last 3 months?
Previous
Next
Reply
Map
View

Click here to load this message in the networking platform