The program performs several steps to identify matches. First, it standardizes data on the incoming record. Then it selects a pool of records that are potential matches and standardizes data on those records for comparison. Next, it compares field values on the incoming record with field values on the existing records and calculates match scores and are weighted based on field type. Next, the algorithm deducts match scores from a perfect score of 100 to calculate total match confidence scores. And finally, the program compares match confidence scores to the match confidence thresholds to determine whether the incoming record is a match, a possible match, or not a match.
To ensure that minor differences such as capitalization and punctuation do not prevent the program from finding matches, the program standardizes data from incoming records to make the values easier to match. The standardization process:
• | Capitalizes all letters |
• | Removes periods from first names, last names/organization names, and street names |
• | Removes apostrophes from last names/organization names |
• | Removes extra spaces before and after all values |
• | Converts carriage returns, dashes, and double spaces in addresses to single spaces |
• | Removes add-on digits to convert ZIP+4 to 5-digit ZIP codes |
To select the pool of potential matches from the records in your database, the program compares the following fields on the incoming record to the corresponding fields on existing records:
• | Lookup ID (Constituent Update batches only) |
• | Lookup ID that matches an Alternate ID on the existing constituent (Constituent Update batches only) |
• | Alternate lookup ID and Alternate lookup type (Constituent Update batches only) |
Note: Matches found by Lookup ID or Alternate lookup ID/Alternate lookup type receive match confidence scores of 100. If the Process automatically option is selected for batches, the program automatically assigns record IDs from existing records to these matched records in batches.
• | Email address |
• | Phone number |
• | Zip code and first four characters of Last name/Organization name |
• | Zip code, street name soundex (an algorithm that indexes names by sound), and first three characters of Last name |
• | First three characters of Zip code, First name soundex (an algorithm that indexes names by sound), first four characters of street name, and the street number |
If the program does not find any potential matches based on these field comparisons, it compares incoming constituents to existing constituents without addresses based on Last name/Organization name and the first three characters of First name. For example, if the incoming constituent is Jonathan Mott, the program includes Jon Mott, Joni Mott, or Jonathon Mott as possible matches.
If First name is blank on the incoming record, the program compares the incoming constituent to existing constituents matching last names and blank first names based on Last name. For example, if the incoming constituent has the last name Mott and First name is blank, the program only includes existing records with the last name Mott and no first name.
If the first name-only search finds no matches for an incoming record without an address, the program attempts to match that record by Organization name or Last name and the first three characters of First name to existing records with addresses.
Note: To prevent performance issues, the program excludes common names from the name-only search. A name is "common" if more than 1000 constituents in your database without addresses have the same last name and first 3 characters of the first name. If no constituents without addresses have that name, but 1000 or more constituents with addresses have it, the name is considered "common" and excluded from the search. The program identifies the common names in your database when you upgrade to 2.94 or later and stores them in a table. To edit this table, contact Professional Services.
After the program selects a pool of potential matches from the records in your database, it performs additional standardization on addresses:
Convert spelled-out numbers to numerals. ("Two" becomes "2" and "Tenth" becomes "10th")
Converts some spelled-out words to abbreviations, including:
Street suffixes ("Street" becomes "ST" and "Road" becomes "RD")
Directionals ("North" becomes "N" and "Southwest" becomes "SW")
Secondary unit designations ("Apartment" becomes "APT" and " Suite" becomes "STE")
After the program determines its pool of potential matches, it compares the fields on the incoming record to the corresponding fields on each of the records in the pool and then assigns match scores to each field. Each type of field has a range of scores to determine whether it is a match, likely match, possible match, or not a match. The algorithm checks the following scenarios:
Scenario | Example | Result |
---|---|---|
First name / First initial |
John vs. J |
Likely match |
Middle name / First initial |
John vs. J |
Likely match |
First name / First name & middle initial |
John vs. John A |
Match |
First name spelling variation |
Chris vs. Kris |
Likely match |
Different first names with similar spelling or pronunciation |
John vs. Joan |
Not a match |
Nickname |
Christopher vs. Chris |
Match |
Last name / Hyphenated last name |
Smith vs. Smith-Jones |
Match |
Street number / Hyphenated street number |
4 vs. 4-2 |
Match |
Same base street names, different street suffixes |
Main St vs. Main Rd Northwest vs. NW |
Possible match |
Same base street names, missing or reordered street suffixes |
Main vs. Main St SE King St. vs. King St. |
Match |
Three digits of zip code match, but not first three
|
02138 vs. 02234 02141 vs. 02138 |
Not a match Possible match |
Titles are different but have the same gender Titles are different and one or both genders are unknown Titles are different and their genders are different |
Mrs. vs. Ms. Mrs. vs. Dr. Mrs. vs. Mr. |
Likely match Possible match Not a match |
Suffixes
|
II vs. Jr. blank vs. Sr. blank vs. Jr., II, III, or IV Sr. vs .Jr., II, III, or IV II vs. III or IV III vs. IV Any other suffix combination without blanks |
Match Likely match Possible match Not a match Not a match Not a match Possible match |
Note: The FIRSTNAMEMATCH table in the database contains the nicknames (Christopher vs. Chris), spelling variations (Allen vs. Allan), and different first names with similar spellings (John vs. Joan) that the program uses to compare first name values.
For any data that does not fit these scenarios, the program applies a "fuzzy" matching algorithm to calculate how similar field values are on a character-by-character basis. The program uses a few different methods to calculate fuzzy match scores, but the simplest method compares two values and identifies the number of changes necessary to make the values match exactly. Changes include adding, removing, changing, or transposing characters. For example, if you compare "Christopher" vs. "Chrsitopher," you would need to transpose the "si" in "Chrsitopher" to fix the misspelling (one change). The program divides the number of changes by the number of characters in the longer name, multiplies the quotient by 100, and then subtracts the product from 100. For this example, 1/11 = .09, .09 * 100 = 9, and 100-9=91.
Each type of field has a range of fuzzy scores to determines whether it is a likely match, possible match, or not a match.
Likely Match | Possible Match | Not a Match | |
---|---|---|---|
First name |
77-99 |
68-76 |
less than 68 |
Last name/Organization name |
86-99 |
50-85 |
less than 50 |
Street number |
75-99 |
50-74 |
less than 50 |
Street name |
81-99 |
58-80 |
less than 58 |
Zip code |
80-99 |
60-79 |
less than 60 |
After the program classifies fields as a matches, likely matches, possible matches, or not matches, it calculates a total match score for the incoming and existing records. Each result equals a weighted number of points which are deducted from 100 (100 being an exact match) as shown below.
Match | Likely Match | Possible Match | Not a Match | Incoming is Blank, Existing is Not | Existing is Blank, Incoming is Not | |
---|---|---|---|---|---|---|
Title |
0 |
1 |
2 |
18 |
0 |
0 |
Suffix |
0 |
1 |
3 |
18 |
0 |
0 |
Last name/Organization name or Maiden name |
0 |
3 |
8 |
15 |
n/a |
n/a |
Street number |
0 |
8 |
17 |
24 |
1 |
3 |
Street name |
0 |
5 |
14 |
31 |
18 |
21 |
Zip code |
0 |
7 |
12 |
31 |
6 |
1 |
First name and middle name combinations impact the matching score. If the same name or initial appears in the First name field of one record and the Middle name field of a potential match, the program scores the records as if the values were in the same fields on both records. For example, if the incoming constituent is First name = John and Middle name = Anderson and the existing constituent is First name = John Anderson and Middle name = blank, the program scores the first and middle names as "Matches." If the incoming constituent is First name = John and Middle name = Anderson and the existing constituent is First name = John A, the program scores the first name as a "Match" and the middle name as a "Likely match." The table below demonstrates how the program scores first and middle name combinations.
A record's total matching score determines its matching threshold. This in turn determines whether the program automatically assigns the record ID from the existing constituent to the matched incoming constituent, waits for a user to manually review the match and make a decision, or automatically creates a record.
The matching algorithm provides optimal results when you use the default threshold percentages (such as 100-95%, 94-70%, and 69-0% for batches). If you adjust the percentages, the program may automatically update records that you do not expect or require you to manually review records that are not matches.
Before you adjust the default thresholds, you should weigh the following considerations:
• | If you turn off the Matched constituents threshold for batches, all matches require manual review, even if the Lookup IDs are the same or all values match exactly. |
• | If you lower the bottom percentage of the Matched constituents threshold for batches, matches that previously required manual review may subsequently be automatic matches. Unless you consistently approve matches with scores just below the current Matched constituents threshold, you should not edit this threshold. |
• | If you raise the bottom percentage of the Matched constituents threshold for batches, matches that previously were matched automatically may subsequently require manual review. Unless the program automatically matches constituents that do not match, you should not edit this threshold. |
• | If you lower the bottom percentage of the Possible matches threshold, constituents that previously were not considered possible matches may subsequently require manual review. Unless the program consistently creates duplicate records for constituents that already exist, you should not edit this threshold. |
• | If you raise the bottom percentage of the Possible matches threshold, potential matches that previously required manual review may no longer be flagged for manual review. Unless you consistently reject possible matches with lower match scores, you should not edit this threshold. |