Constituent Matching Algorithm

To ensure that minor differences such as capitalization and punctuation do not prevent the program from finding matches, the program standardizes data from incoming records to make the values easier to match. The standardization process:

•

Capitalizes all letters

•

Removes periods from first names, last names/organization names, and street names

•

Removes apostrophes from last names/organization names

•

Removes extra spaces before and after all values

•

Converts carriage returns, dashes, and double spaces in addresses to single spaces

•

Removes add-on digits to convert ZIP+4 to 5-digit ZIP codes

Select a Pool of Records

To select the pool of potential matches from the records in your database, the program compares the following fields on the incoming record to the corresponding fields on existing records:

•

Lookup ID (Constituent Update batches only)

•

Lookup ID that matches an Alternate ID on the existing constituent (Constituent Update batches only)

•

Alternate lookup ID and Alternate lookup type (Constituent Update batches only)

Note: Matches found by Lookup ID or Alternate lookup ID/Alternate lookup type receive match confidence scores of 100. If the Process automatically option is selected for batches, the program automatically assigns record IDs from existing records to these matched records in batches.

•

Email address

•

Phone number

•

Zip code and first four characters of Last name/Organization name

•

Zip code, street name soundex (an algorithm that indexes names by sound), and first three characters of Last name

•

First three characters of Zip code, First name soundex (an algorithm that indexes names by sound), first four characters of street name, and the street number

If the program does not find any potential matches based on these field comparisons, it compares incoming constituents to existing constituents without addresses based on Last name/Organization name and the first three characters of First name. For example, if the incoming constituent is Jonathan Mott, the program includes Jon Mott, Joni Mott, or Jonathon Mott as possible matches.

If First name is blank on the incoming record, the program compares the incoming constituent to existing constituents matching last names and blank first names based on Last name. For example, if the incoming constituent has the last name Mott and First name is blank, the program only includes existing records with the last name Mott and no first name.

If the first name-only search finds no matches for an incoming record without an address, the program attempts to match that record by Organization name or Last name and the first three characters of First name to existing records with addresses.

Note: To prevent performance issues, the program excludes common names from the name-only search. A name is "common" if more than 1000 constituents in your database without addresses have the same last name and first 3 characters of the first name. If no constituents without addresses have that name, but 1000 or more constituents with addresses have it, the name is considered "common" and excluded from the search. The program identifies the common names in your database when you upgrade to 2.94 or later and stores them in a table. To edit this table, contact Professional Services.

After the program selects a pool of potential matches from the records in your database, it performs additional standardization on addresses:

Convert spelled-out numbers to numerals. ("Two" becomes "2" and "Tenth" becomes "10th")
Converts some spelled-out words to abbreviations, including:
- Street suffixes ("Street" becomes "ST" and "Road" becomes "RD")
- Directionals ("North" becomes "N" and "Southwest" becomes "SW")
- Secondary unit designations ("Apartment" becomes "APT" and " Suite" becomes "STE")

Calculate Matching Scores

After the program determines its pool of potential matches, it compares the fields on the incoming record to the corresponding fields on each of the records in the pool and then assigns match scores to each field. Each type of field has a range of scores to determine whether it is a match, likely match, possible match, or not a match. The algorithm checks the following scenarios:

Scenario	Example	Result
First name / First initial	John vs. J	Likely match
Middle name / First initial	John vs. J	Likely match
First name / First name & middle initial	John vs. John A	Match
First name spelling variation	Chris vs. Kris	Likely match
Different first names with similar spelling or pronunciation	John vs. Joan	Not a match
Nickname	Christopher vs. Chris	Match
Last name / Hyphenated last name	Smith vs. Smith-Jones	Match
Street number / Hyphenated street number	4 vs. 4-2	Match
Same base street names, different street suffixes	Main St vs. Main Rd Northwest vs. NW	Possible match
Same base street names, missing or reordered street suffixes	Main vs. Main St SE King St. vs. King St.	Match
Three digits of zip code match, but not first three	02138 vs. 02234 02141 vs. 02138	Not a match Possible match
Titles are different but have the same gender Titles are different and one or both genders are unknown Titles are different and their genders are different	Mrs. vs. Ms. Mrs. vs. Dr. Mrs. vs. Mr.	Likely match Possible match Not a match
Suffixes	II vs. Jr. blank vs. Sr. blank vs. Jr., II, III, or IV Sr. vs .Jr., II, III, or IV II vs. III or IV III vs. IV Any other suffix combination without blanks	Match Likely match Possible match Not a match Not a match Not a match Possible match

Note: The FIRSTNAMEMATCH table in the database contains the nicknames (Christopher vs. Chris), spelling variations (Allen vs. Allan), and different first names with similar spellings (John vs. Joan) that the program uses to compare first name values.

For any data that does not fit these scenarios, the program applies a "fuzzy" matching algorithm to calculate how similar field values are on a character-by-character basis. The program uses a few different methods to calculate fuzzy match scores, but the simplest method compares two values and identifies the number of changes necessary to make the values match exactly. Changes include adding, removing, changing, or transposing characters. For example, if you compare "Christopher" vs. "Chrsitopher," you would need to transpose the "si" in "Chrsitopher" to fix the misspelling (one change). The program divides the number of changes by the number of characters in the longer name, multiplies the quotient by 100, and then subtracts the product from 100. For this example, 1/11 = .09, .09 * 100 = 9, and 100-9=91.

Each type of field has a range of fuzzy scores to determines whether it is a likely match, possible match, or not a match.

	Likely Match	Possible Match	Not a Match
First name	77-99	68-76	less than 68
Last name/Organization name	86-99	50-85	less than 50
Street number	75-99	50-74	less than 50
Street name	81-99	58-80	less than 58
Zip code	80-99	60-79	less than 60

Calculate Total Matching Scores

After the program classifies fields as a matches, likely matches, possible matches, or not matches, it calculates a total match score for the incoming and existing records. Each result equals a weighted number of points which are deducted from 100 (100 being an exact match) as shown below.

	Likely Match	Possible Match	Not a Match	Incoming is Blank, Existing is Not	Existing is Blank, Incoming is Not
Title	1	2	18	0	0
Suffix	1	3	18	0	0
Last name/Organization name or Maiden name	3	8	15	n/a	n/a
Street number	8	17	24	1	3
Street name	5	14	31	18	21
Zip code	7	12	31	6	1

First name and middle name combinations impact the matching score. If the same name or initial appears in the First name field of one record and the Middle name field of a potential match, the program scores the records as if the values were in the same fields on both records. For example, if the incoming constituent is First name = John and Middle name = Anderson and the existing constituent is First name = John Anderson and Middle name = blank, the program scores the first and middle names as "Matches." If the incoming constituent is First name = John and Middle name = Anderson and the existing constituent is First name = John A, the program scores the first name as a "Match" and the middle name as a "Likely match." The table below demonstrates how the program scores first and middle name combinations.

Should I Adjust Constituent Matching Settings?

A record's total matching score determines its matching threshold. This in turn determines whether the program automatically assigns the record ID from the existing constituent to the matched incoming constituent, waits for a user to manually review the match and make a decision, or automatically creates a record.

The matching algorithm provides optimal results when you use the default threshold percentages (such as 100-95%, 94-70%, and 69-0% for batches). If you adjust the percentages, the program may automatically update records that you do not expect or require you to manually review records that are not matches.

Before you adjust the default thresholds, you should weigh the following considerations:

•

If you turn off the Matched constituents threshold for batches, all matches require manual review, even if the Lookup IDs are the same or all values match exactly.

•

If you lower the bottom percentage of the Matched constituents threshold for batches, matches that previously required manual review may subsequently be automatic matches. Unless you consistently approve matches with scores just below the current Matched constituents threshold, you should not edit this threshold.

•

If you raise the bottom percentage of the Matched constituents threshold for batches, matches that previously were matched automatically may subsequently require manual review. Unless the program automatically matches constituents that do not match, you should not edit this threshold.

•

If you lower the bottom percentage of the Possible matches threshold, constituents that previously were not considered possible matches may subsequently require manual review. Unless the program consistently creates duplicate records for constituents that already exist, you should not edit this threshold.

•

If you raise the bottom percentage of the Possible matches threshold, potential matches that previously required manual review may no longer be flagged for manual review. Unless you consistently reject possible matches with lower match scores, you should not edit this threshold.