Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Finding mismatched data
Message
From
03/05/2012 11:04:01
 
 
To
03/05/2012 10:32:15
Dragan Nedeljkovich (Online)
Now officially retired
Zrenjanin, Serbia
General information
Forum:
Microsoft SQL Server
Category:
SQL syntax
Environment versions
SQL Server:
SQL Server 2008
Miscellaneous
Thread ID:
01543096
Message ID:
01543113
Views:
28
>>No, I don't want similarly spelled names to show as a match, I'm trying to show anything that does not match so that the user can manually fix it. Here's some sample data, cman is the database at Headquarters and cmanSO is the database at the South Office.
>>
>>cman Agents
>>
>>Name          ID
>>Frank          1
>>Fred           2
>>Naomi          3
>>Nadya          4
>>Justin         5
>>Jonathan       6
>>
>>cmanSO Agents
>>
>>Name          ID
>>Frank          1
>>Fred           2
>>Natasha        3
>>Naomy          4
>>Nadya          5
>>
>>My logic so far is that I need to first of all find those agents that have the same ID but different name:
>>
>>
SELECT HQAgents.Name as HQName
>>		,HQAgents.ID as HQID
>>		,SOAgents.name as soname
>>	from cman.dbo.Agents hqagents join cmanso.dbo.agents soagents
>>		on hqagents.ID = soagents.ID and hqagents.Name <> soagents.Name
>>	order by HQName, soname
>>
>>So I should get this output:
>>HQName ID SOName
>>Naomi 3 Natasha
>>Nadya 4 Naomy
>>Justin 5 Nadya
>>
>>Then those that have the same name but different ID:
>>
>>
SELECT HQAgents.Name as HQName
>>		,HQAgents.ID as HQID
>>		,SOAgents.id as soid
>>	from cman.dbo.Agents hqagents join cmanso.dbo.agents soagents
>>		on hqagents.ID <> soagents.ID and hqagents.Name = soagents.Name
>>	order by HQName, soname
>>
>>Output
>>
HQName    HQID   SOID
>>Nadya         4        5
>>
>>Then those that exist in cman but not in cmanSO based on Name (Jonathan in this case) and those in cmanSO not in cman based on name.
>>
>>I then have to do something about manually cleaning up any misspellings and then programmatically build up a new Agents table and somehow sort out the related tables to use the new Agent ID.
>>
>>I hope that explains it a bit clearer, but it's still murky in my head as to what to do.
>
>Been there, and still have some hair left. Don't even remember how each case was solved, probably differently each time. My got feeling is that you're on the right track. So, just a small suggestion for the next step:
>
>List anything that's not 100% matched, and save the pairings in a table. When the next batch comes in, use these pairings to clean up what was already matched so far (for example, it was already established that Jonhy and Johnny are the same person, and you have that in the table), so show only the new mismatches.
>
>Eventually you'd end up with "Johnny is called Jonhy at location A, and Jhony at location C"... a sort of dictionary of various forms of the same names, that translate into one that you are taking as right.

Thanks Dragan,

I don't have any hair left to lose. This is going to be a one-off as after the merge is done, maintenance at the South Office will not be allowed, so hopefully I won't need the translation table.
Frank.

Frank Cazabon
Samaan Systems Ltd.
www.samaansystems.com
Previous
Reply
Map
View

Click here to load this message in the networking platform