>>>I need some help to figure out how I can find out which records are being
>>"dropped" when I do a SELECT * DISTINCT. I did a search for this and came
>>up with some suggestions but they don't seem to fit this situation. One of
>>the suggestions was:
>>>
>>>
>>I don't exactly understand what you mean by "dropped".
>
> What I meant was, when I did a SELECT * DISTINCT, all my duplicated records are reduced to just one record. So the others are "dropped". This is a program that is taking accounting data from one system and formatting it to be imported into another accounting system. I'm doing the SELECT * DISTINCT to eliminate duplicate entries (someone's mistake) so they don't migrate into the new system. But, they want a way of tracking what I've eliminated so if they can't balance their books, they have a starting place on where to look.
>
>>
>>Also, do you mean GROUP BY field2 in the above?
>
> Using the code someone else suggested, I could find a duplicate entry (on only one field) by using the GROUP clause. By using that, and COUNT(field) I would then have an extra field in my query that contained the number of duplicates found. Then it'd be really easy to filter out only the entries that had a count > 1. These were the ones I was dropping. But since I need the record to be distinct across several (all, actually) fields I wasn't able to use the GROUP BY option.
Paul,
As I understand you want to find all records that are exactly a duplicate with regard to all fields ie: f1..fn exactly has a matching f1...fn another record(s). To do it I think a record could be translated into a specific value.
function nondistinct
lparameters cTableName
select cksum() as checksum,recno() as rcno ;
from (cTableName) ;
where .t. ;
into cursor tcCheckSums
select rcno, checksum from tcCheckSums ;
where checksum in ;
(select checksum from tcCheckSums ;
having cnt(*)>1 ;
group by checksum ) ;
order by checksum ;
into cursor tcDuplicates
set order to 0 in (cTableName)
set relation to rcno into (cTablename)
function cksum
cVal = 1
for ix=1 to fcount()
do case
case type(field(ix))$"CM"
cRecord = eval(field(ix))
case type(field(ix))$"NY"
cRecord = str(eval(field(ix)))
case type(field(ix))$"D"
cRecord = dtoc(eval(field(ix)))
case type(field(ix))$"T"
cRecord = ttoc(eval(field(ix)))
case type(field(ix))$"L"
cRecord = iif(eval(field(ix)),"T","F")
otherwise
cRecord = ""
endcase
cVal = cVal*val(sys(2007,cRecord))
endfor
return cVal
Of course there there could be other elegant SQL way to do this but for now this one works good for me. To gain speed in fact, function cksum is kept in stored procs and "index on cksum() tag checksum" provides a on demand dup checking.
Cetin