Identifying duplicates - Level Extreme

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Identifying duplicates

Message

From

05/06/2001 05:23:52

Vlad Grynchyshyn
Soft Serve
Lviv, Ukraine

04/06/2001 20:29:13

Farouk Yew
Kuala Lumpur, Malaysia

General information

Forum:

Visual FoxPro

Category:

Databases,Tables, Views, Indexing and SQL syntax

Title:

Re: Identifying duplicates

Miscellaneous

Thread ID:

00513617

Message ID:

00514977

Views:

Hi!

Ok, understood now. Following is the approach with samples.

SELECT DISTINCT A.* from A INNER JOIN B ON A.EmpNo=B.EmpNo into cursor AllMatching ;
UNION ;
SELECT DISTINCT B.* from B INNER JOIN A ON B.EmpNo=A.EmpNo

Above query will return you the result as you described - all records that have matching key field, but differences in data fields are not compared.

Exaplanation of query:

First query is select all records from table A that have a key field in the B.
Second query is the same - all records from B that have matching-key records in A.
DISTINCT in both queries is to select only one record from A if there are meny matching-key records in B and vise versa.
Finally, UNION without 'ALL' keyword will combine the results of both queries into the single result set with matching-key records xists, but records with same key and differences on other fields will exist 2 or mnor times. Now you have somethoing like following:

001 John
001 Peter
003 James

As you see, now you can easy count the number of records with the similar key and filter out records that appear only once in the previous result set:

SELECT EmpNo, count(*) as CNT from AllMatching group by EmpNo Having CNT>1 into cursor DifferenceKeys

Final query will give you the result:

Select A.* from A INNER JOIN DifferenceKeys D on A.EmpNo=D.EmpNo into cursor Result ;
UNION ALL ;
Select B.* from B INNER JOIN DifferenceKeys D on B.EmpNo=D.EmpNo

Note that you can get following in the result:

001 John
001 John
001 Peter
003 James

However, I hope inside of the A and B tables EmpNo is unique, so this will not happen.

HTH.

Explanation of queries I have posted:

>Thanks Vlad
>
>I'm not sure if I have not phrased my question correctly. But let me give an example of the problem :
>
>Table A
>EmpNo Name
>001 John
>002 Amy
>003 James
>
>Table B
>Empno Name
>001 Peter
>003 James
>005 Susan
>
>Based on above eg, we have an integrity problem bcos "001" is John in table A and Peter in table B. "003" is ok assuming all non key fields also matches. What I need to do is to create the following exception list :
>
>Empno Name
>001 John
>001 Peter
>
>For the eg above, the table is just a 2 field table and I think the following SQL should work :
>
>SELECT * from a INNER JOIN b ON a.empno=b.empno where a.name<>b.name
>
>However it is not feasible if the number of non key fields is a lot. In that case, the SQL would have to go like this :
>
>SELECT * from a INNER JOIN b ON a.empno=b.empno WHERE (a.name<>b.name OR a.addrs<>b.addrs OR a.nationality<>b.nationality OR .........)
>
>Hope to hear from u, Vlad and thanks.
>
>Best Regards
>>Hi!
>>
>>And, of course, ass ';' to expand VFP command to next line (damn, worked a lot with SQL Server last time ;)
>>
>>
>>
>>
>>>Hi!
>>>
>>>I will show this for the 1-field integer key. You can easy replace the 'key' field by expression that contains the 2 key fields as you need, as well as generate unique key.
>>>
>>>

>>>* select all keys from a that have similar key values in b
>>>Select a.key from a where a.key IN (select b.key from b) into cursor AllSimilarKeys
>>>
>>>* select the records from a and b that contain unique values.
>>>* Similar-value records will be filtered out from this result by UNION without 'All'
>>>* {if this will not work, use 'SELECT DISTINCT FROM SimilarUnique' just after this query}
>>>Select a.* from a where a.key in (select key from AllSimilarKeys) into cursor SimilarUnique
>>>UNION
>>>select b.* from a where b.key in (select key from AllSimilarKeys)
>>>
>>>* now fix the key field value so it will be unique in the final result set for all records
>>>&& get a max key value
>>>select max(key) from a into array aa
>>>select max(key) from b into array bb
>>>if aa>bb
>>>  mm = aa
>>>else
>>>  mm = bb
>>>endif
>>>
>>>&& use the result of query again to make it editable
>>>use (dbf('SimilarUnique')) AGAIN alais SU in SimilarUnique
>>>
>>>&& scan the result and replace all similar key values by unique value for both tables
>>>select SU
>>>mm=mm+1
>>>scan all
>>>  replace key with mm
>>>  mm=mm+1
>>>endscan
>>>
>>>* select all other records - the final result set
>>>SELECT a.* from a where a.key NOT IN (select key from AllSimilarKeys) into cursor Result
>>>UNION ALL
>>>SELECT b.* from b where b.key NOT IN (select key from AllSimilarKeys)
>>>UNION ALL
>>>SELECT SU.* from SU
>>>
>>>use in SU
>>>use in AllSimilarKeys
>>>

>>>
>>>Index both tables by the key field for best performance.
>>>
>>>HTH.
>>>
>>>
>>>>Hi VFP users/experts
>>>>
>>>>I need to merge 2 tables of identical structure. Before merging, I need to find rows from table 1 that match rows of table 2 in the primary key fields but not in the non key fields. The table has primary key based on 2 fields and 100 non key fields. Appreciate if you could help me construct SQL to detect rows from the 2 tables with similar primary fields but dissimilar non key fields.
>>>>
>>>>Thanks

Vlad Grynchyshyn, Project Manager, MCP
vgryn@yahoo.com
ICQ #10709245
The professional level of programmer could be determined by level of stupidity of his/her bugs

It is not appropriate to say that question is "foolish". There could be only foolish answers. Everybody passed period of time when knows nothing about something.

Map

View

Click here to load this message in the networking platform