Duplicate Search Workflow

Use these steps to find duplicates saved in your database.

  • Decide which type of duplicate search process to use.

    You can use the SQL Server Integration Services (SSIS) duplicates search packages or the Full duplicate search and Incremental duplicate search tasks provided on the Duplicates page. Both types of search processes identify possible duplicates based on scoring parameters you configure. For more information, see Which Search Process Should I Use?.

  • Run the full duplicate search process to search the entire database. For more information, see Run the Full Duplicate Constituent Search Process.

    Warning: The duplicate search process can take an extended period of time to run depending on the number of records in your database and the configuration options you select for the process.

  • Run the Duplicate Constituents Report to view a list of constituent records identified as potential duplicates by the SSIS package or search process. When you run the report, you select whether to view it for the last SSIS process run or the last full or incremental search process run. For more information, see Run the Duplicate Constituents Report.

  • Verify that the records identified as duplicates in the report are indeed duplicates that should be merged. If records are found that are not duplicates, take appropriate steps to correct or clarify information on their constituent records. For more information, see Run the Duplicate Constituents Report.

    Note: If you make changes to constituent records after viewing the Duplicate Constituents Report, you should run the duplicate search or SSIS process again. After that, run the Duplicate Constituents Report again and verify the constituents listed are duplicates.

  • Run the merge process. After the merge process completes, run the full constituent search process again, and then run the Duplicate Constituents Report again. View the report and verify that the duplicate records have been merged. For more information about merges, see Merge Duplicate Constituents.

  • Continue to run search and merge processes until the database is "clean." After you run a full search, you can reduce the time required for subsequent searches by running the incremental or SSIS partial process. The incremental and partial duplicate search processes take less time because they compare only those records that were added or updated since the last search process was run.

  • Schedule full and incremental or full and partial SSIS duplicate search processes to run automatically at regular intervals. This strategy, in conjunction with the automatic duplicate searches that run during data entry, will help maintain a clean database. For more information about how to schedule full and incremental search processes, see Configure Duplicate Search Process Job Schedules. For more information about how to schedule SSIS duplicate search processes, see SSIS Duplicate Search Processes.

    Note: The duplicates search processes and merge processes work in tandem. The groups of duplicates identified by searches provide the data sources for merge processes. For this reason, you should schedule the search and merge processes to run at similar intervals.