Menu Home

Beyond Vanilla – Flavors of Matching

It’s one thing to want to match against watchlists, and it’s another to figure out all the ways your technology can miss something you want to catch (or catch something you don’t want).

So, consider the following things that some screening engines do, and consider the implications for the results you get:

    All Matching Technologies

  • Name Variations
    • Does the matching engine provide alternate spellings of names, like the multiple ways that Mohamed can be spelled? How about common abbreviations like Mohd or Muhd?
    • Does the matching engine provide abbreviations of common words, like Co., Inc., or Ltd.?
    • Does the matching engine provide foreign translations, such as Bank, Banco, Banca, and Banque?
  • Matching Functionality
    • How does the matching engine treat tokens that are out of order? Are Kim Young and Young Kim identical, or (at least a partial) mismatch?
    • How does the matching engine treat additional tokens between the target tokens? Is there a matching window such that all tokens must be within a certain distance of each other? Is that window fixed or tunable?
    • How does the matching engine handle embedded characters? Will it find “Ira n”, or even “I r a n”?
    • How does the matching engine handle run-on text? Will it match Muhammad Ibrahim to MuhammadIbrahim?

    Fuzzy Matching

    • Is the match threshold for the entire phrase being matched, or for each individual token?
    • If the match threshold is for the entire phrase, is there a minimum match percentage for each individual token? In other words, will XX International match UI International if the threshold is high enough?
    • How is are transpositions (flipped letters) scored? Are they one edit error, or two?
    • Are transpositions only of adjacent characters, or can any two positions be flipped and scored as a transposition (instead of two substitutions)? So, is Nalcof considered a single transposition for Falcon?
    • Are inserted letters that are duplicates of adjacent letters scored differently than other insertions? In other words, is Cubba scored differently for Cuba than Cuxba?
    • Is keyboard layout considered when scoring insertions and substitutions? In other words, is Myhammad a better score for Muhammad, because the Y is next to the U on the keyboard? And which keyboard layouts are considered?
    • If a token is a legitimate word or name, is fuzzy matching still invoked? In other words, will Fernandez still match Hernandez, all other things being equal, even though Fernandez is a legitimate name?
    • Do use of alternate spellings, abbreviations or foreign translations affect the match score? How?
    • Does token order affect the match score? How?
    • Do intervening tokens between the target tokens affect the match score? How?

    Phonetic Matching

    • Do use of alternate spellings, abbreviations or foreign translations affect the match score?If so, how? If not, why not?

This is a good starter list. If you have others that would be useful to ask a prospective vendor, please comment below.

Categories: Matching Technologies

eric9to5

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: