Information générale
Catégorie:
Base de données, Tables, Vues, Index et syntaxe SQL
I've been posed with a problem to solve and I would like to get some input on the solution - any ideas would be greatlu appreciated.
Given the following sample table:
ForiegnKey DataField
========== ==========
1 1
1 2
1 3
2 2
2 3
2 4
3 2
3 4
We assume that the Foreign Key refers to a table of customers, so the problem will refer to "customers,"
and that the DataField is a product number. We wish to determine the number of "crossovers"
between any particular value of the data field. In other words, how many customer bought both products 2
and 4?. In this case, customers 2 and 3 each have records where the data field is 2 and 4, so the answer
is two.
The conventional approach is to do a SQL self-join, something like the following:
SELE a.DataField DF1,b.DataField df2,COUNT(*) as "TheCount";
FROM TheTable a JOIN TheTable b ON a.ForeignKey = b.ForeignKey;
GROUP BY df1,df2
Which yields:
DF1 DF2 TheCount
=== === ==========
1 1 1
1 2 1
1 3 1
2 1 1
2 2 3
2 3 2
2 4 2
3 1 1
3 2 2
3 3 2
3 4 1
4 2 2
4 3 1
4 4 2
To further explain, we can now look at the fifth row and say, "3 customer bought product 2," or look at row
6 and say "two customers who bought product 2 also bought product 3," or look at row three and say, "only
one customer who bought product 1 also bought product 3."
The problem is that, when the number of records in the table gets large, this can take days, even if it does
not crash out Fox or NT.
OBJECTIVE
Find a way to get the same result as the SQL self-join with linear performance based on the number of records
in the source table.
ASSUMPTIONS
1. The table is already sorted on Foreign Key + Data Field
2. The values for Data Field are known in advance and are stored in another table, usually not more than
100-200 values
3. The table can have many millions of records, so creating interim tables by breaking up the SQL into
two passes risks the 2G limit.
4. For the doubters, yes the table is indexed and the self-join still takes > 24 hours when indexed.
Thanks
Suivant
Répondre
Voir le fil de ce thread
Voir le fil de ce thread à partir de ce message seulement
Voir tous les messages de ce thread
Voir tous les messages de ce thread à partir de ce message seulement