Duplicate Search Workflow
Use duplicate search processes to identify and merge constituent records that represent the same individual or organization. A regular search and merge routine helps maintain accurate constituent data and supports reliable reporting and analysis.
Choose a duplicate search process
Decide which type of duplicate search process to use. Both processes identify potential duplicates based on scores for parameters you configure.
Full Search and Incremental Search Processes
Pros:
-
Because these searches are built into the program, they do not require special technical knowledge to configure and customize. The options provided in the program allow you to easily select which fields to match on, which types of constituents to include, and how to filter the results. You can also adjust the match confidence thresholds. Some of these options are also available with the SSIS package, but others must be set up manually or are not possible to use at all.
Cons:
-
These searches rely on the soundex algorithm to create groups of constituents to compare. Soundex is based on English pronunciation so it may not be a good fit for organizations that have lots of constituents with non-English names.
Furthermore, there may be a small percentage of duplicates that are not found through the full or incremental search processes because misspellings that are phonetically dissimilar are not recognized by the soundex phonetic algorithm. Typically, this is not a problem because most misspellings do not change the beginning phonetic sound of the name. But in some cases, it could be an issue. For example, two constituent records with the same address and the last names “Tanner” and “Manner” will not be flagged as duplicates because the phonetic sound of the last name is too dissimilar. If you encounter this issue frequently, you may consider using the SSIS search processes instead of, or in addition to, the full and incremental searches. The SSIS search processes do not use the soundex component.
-
The full and incremental searches run on the application and database servers. The additional load on the application server may result in decreased performance for other processes running at the same time.
-
The search algorithm only finds matches based on key names and constituents of the same type. For example, it would not match an Individual with an Organization.
SSIS Duplicate Search Processes
Pros:
-
You can select which fields to match on, even including those that are not available in the full and incremental search processes.
-
The search algorithm can match constituents of different types. For example, it can match an Individual with an Organization.
-
The SSIS package uses a fuzzy matching algorithm that is not language-dependent like soundex. This method may provide better results for your organization. For more information about the benefits of SSIS fuzzy matching, see http://msdn.microsoft.com/msdnmag/issues/05/09/SQLServer2005/default.aspx.
-
The SSIS package runs on the database server so there is no additional load to the application server.
Cons:
-
The SSIS package requires significant technical knowledge to implement. You must know how to set up and schedule SSIS packages, and how to use SQL Server.
Run a full duplicate search
Run the full duplicate search process to evaluate the entire database.
For details, see Run the Full Duplicate Constituent Search Process.Run the Full Duplicate Constituent Search Process.
Warning: The duration of this process varies based on the number of records and the configuration options you select. It can take an extended amount of time.
Review possible duplicates
Run the Duplicate Constituents Report to view constituent records identified as possible duplicates.
When you run the report, you'll select whether to view results from the most recent SSIS process or the most recent full or incremental search. For details, seeRun the Duplicate Constituents Report.
Verify duplicate records
Review each pair of records in the report and determine whether the records represent the same constituent.
Verify which records are indeed duplicates which should be merged.
For records that aren't duplicates, update the constituent information to correct or clarify information on their constituent records.
If you make updates, run the duplicate search or SSIS process again, and then run the Duplicate Constituents Report to verify the updated results. See Run the Duplicate Constituents Report.
Merge confirmed duplicates
Run the merge process to combine confirmed duplicate records.
After the merge completes, run the full search process again. Then run the Duplicate Constituents Report to verify that the records merged correctly. For details about merges, seeMerge Duplicate Constituents.
Maintain ongoing data integrity
Continue to run search and merge processes until your database is clean.
After you complete a full search, you can reduce processing time by running incremental or SSIS partial searches. These processes are faster because they only compare records added or updated since the most recent search.
Schedule full and incremental or full and partial SSIS duplicate search processes to run at regular intervals.
These combined strategies (including automatic duplicate searches that run during data entry) help maintain an accurate database.
For more information about how to schedule full and incremental search processes, see Configure Duplicate Search Process Job Schedules. For more information about how to schedule SSIS duplicate search processes, see SSIS Duplicate Search Processes.
Note: The duplicates search processes and merge processes work in tandem. Groups of duplicates identified by searches provide the data sources for merge processes. Thus, we recommend you schedule the search and merge processes to run at similar intervals.