Repeated records - Level Extreme

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Repeated records

Message

From

03/08/2007 17:53:14

Thomas Ganss
Main Trend
Frankfurt, Germany

03/08/2007 08:57:05

Naomi Nosonovsky
Wisconsin, United States

General information

Forum:

Visual FoxPro

Category:

Databases,Tables, Views, Indexing and SQL syntax

Title:

Re: Repeated records

Environment versions

Visual FoxPro:

VFP 9 SP1

OS:

Windows XP SP2

Network:

Windows XP

Database:

MS SQL Server

Miscellaneous

Thread ID:

01245292

Message ID:

01245910

Views:

Naomi,

>Actually, after re-reading the code there is nothing wrong with the numbers, in fact they tell us how many unique records we had. But may be I should start with the clean state anyway, e.g. zap after the last method again.

if you want to restructure your benchmark:
a) make 2 or 3 measurements, one for *significant* each test step
(excep for current 3) to see how each test differs under other data distributions

b) make your version 2 the first one to get tested, time to variables to get exact
measurements, but before writing out the time for the test save the information from
the dupe table as well: then you know afterwards the distribution of duplicates
as well as the number of records.

Something along the lines of
Select DupCount, Count(*) an DupDist from Dupes group by 1

c) make each benchmark a function, the whole benchmarking template 1 function,
a log function, parametrize the tablebuilding as well to get something easily
callable many times creating base tables - set it up once and let it run overnight.

I'ld expect version 1 to be the best always - as excl is needed in current 4 as well,
a pack might be added as optional measurement, because the "no deleted recs" happens only here.

Version 4 should be very fast as long as there are many duplicate records. if there are no dupes,
the relative position in timing to all other approaches is the worst of all measurements, but
it might still be faster than 2 or 3. But it needs excl as well.

Version 2 and 3 work on the table in place. Version 2 will have the best relative position
if no duplicates are found: the dupe table is empty, no scan needed, only the time to build
the dist dupe table. In your old benchmark EVERY record was duped, so HAVING>1 could not save
anything. No wonder it shows bad perf on such data. Perhaps Version 2 is faster than 3
(even 4 on large data sets ?) if very few dupes found. I expect version 3 to show
better perf across many distributions, which is the reason to make it the reccommoendation
for those scenarios forbidding excl access for de-duping and where nothing is known about the distribution.

regards

thomas

Map

View

Click here to load this message in the networking platform