Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Problem
Message
Information générale
Forum:
Visual FoxPro
Catégorie:
Base de données, Tables, Vues, Index et syntaxe SQL
Titre:
Problem
Divers
Thread ID:
00396098
Message ID:
00396098
Vues:
43
I've been posed with a problem to solve and I would like to get some input on the solution - any ideas would be greatlu appreciated.

Given the following sample table:

ForiegnKey DataField
========== ==========
1 1
1 2
1 3
2 2
2 3
2 4
3 2
3 4

We assume that the Foreign Key refers to a table of customers, so the problem will refer to "customers,"
and that the DataField is a product number. We wish to determine the number of "crossovers"
between any particular value of the data field. In other words, how many customer bought both products 2
and 4?. In this case, customers 2 and 3 each have records where the data field is 2 and 4, so the answer
is two.

The conventional approach is to do a SQL self-join, something like the following:

SELE a.DataField DF1,b.DataField df2,COUNT(*) as "TheCount";
FROM TheTable a JOIN TheTable b ON a.ForeignKey = b.ForeignKey;
GROUP BY df1,df2

Which yields:

DF1 DF2 TheCount
=== === ==========
1 1 1
1 2 1
1 3 1
2 1 1
2 2 3
2 3 2
2 4 2
3 1 1
3 2 2
3 3 2
3 4 1
4 2 2
4 3 1
4 4 2

To further explain, we can now look at the fifth row and say, "3 customer bought product 2," or look at row
6 and say "two customers who bought product 2 also bought product 3," or look at row three and say, "only
one customer who bought product 1 also bought product 3."

The problem is that, when the number of records in the table gets large, this can take days, even if it does
not crash out Fox or NT.

OBJECTIVE
Find a way to get the same result as the SQL self-join with linear performance based on the number of records
in the source table.

ASSUMPTIONS
1. The table is already sorted on Foreign Key + Data Field
2. The values for Data Field are known in advance and are stored in another table, usually not more than
100-200 values
3. The table can have many millions of records, so creating interim tables by breaking up the SQL into
two passes risks the 2G limit.
4. For the doubters, yes the table is indexed and the self-join still takes > 24 hours when indexed.

Thanks
Suivant
Répondre
Fil
Voir

Click here to load this message in the networking platform