Menu Home

It takes two to tango

So, companies match their static and transactional data… to what? What specific pieces of the watchlist listing should be matched?

Well, what is there to match on? Well, there’s the name… what else?

Listings often have addresses but you don’t really want to match on those alone. After all, street addresses are not unique and, in fact, may generate false positive matches (there is a street named after Robert Mugabe in Harare, Zimbabwe, for example). And listed individuals may be located in non-sanctioned countries. I know of one OFAC-listed firm (Travel Services, Inc.) located in Florida in the US, for example.

For similar reasons, date of birth is not a good single element to match on. And don’t get me started about OFAC, who lists some dates of birth as “approximate”, in which case they want you to pick up the phone and call them regardless of how far off the dates are.

But, on a frequent basis, there are other ways to identify the listed entity besides their name. For example, national IDs like the Cedula in Colombia or the SSN in the US are included in sanctions listings. Same thing goes for passports for people, DUNS or VAT IDs for companies, IMO (International Maritime Organization) numbers and call signs for cargo vessels, and SWIFT ids and other routing codes for banks (and some corporates).

Now, I have mentioned matching on a single element. Why not multiple pieces of information?

Certainly, you can rely on multiple pieces of information in static data screening – if you have it, if the data quality is good, and if you consider keying errors and use of default values. Raise your hand if you’ve ever seen a system that defaulted date fields to 1/1/1970 (Windows) or November 11, 1858 (DEC VAX – remember those?). Of course, the chances of you getting multiple criteria if you’re screening transactional data are slim to none.

Don’t forget that dates come in multiple formats – US format (month/day/year), European format (day/month/year) and the format used in Asia (year/month/day). It’s important to know how the dates are formatted by the data provider before trying to use date of birth as a matching criteria – at least past using the year.

Now, I should preface the rest of this by noting that how data actually gets matched – whether the matching criteria are in software or in the watchlist databases – varies. So, both the software and data providers need to be able to explain how the data elements that generate matches are being derived.

As you can see, I’ve avoided the elephant in the room – the entity name. What should we match on? Let’s assume that things other than people and groups (i.e. countries, cities, cargo vessels) are pretty simple – match the whole thing. But, what about those pesky people, companies, and other organizations?

At one end of the spectrum, you could match on the entire name as listed – but you’d miss a lot of potential matches. People often don’t use their full names when opening accounts, much less doing business, and the little niceties of corporate names (e.g. The, A, Of, Inc, Corp, etc.) often get omitted in the name of brevity and speed.

On the other hand, you could cast a very wide net. Imagine matching on a person’s last name only, or matching on any individual token in a corporate name – oh, the horror! Clearly there is a middle ground.

Let’s deal with corporate names first. It seems a reasonable middle ground to eliminate articles, prepositions and corporate suffixes (Co, Corp, Inc, Ltd) and match on what’s left. Additionally, if the group or company is identified by an acronym, it would be reasonable to match that, too (although I have a beef about that which I’ll get into in a future post).

How about individuals? Again, you’re not going to get all the names, fully spelled. So, it boils down to a question of what subsets of the possible names you should match on. In general, Mr. Watchlist believes that any matching system must match multiple name components, one of which is the last name.

Now, let’s now enumerate all the little details that one (or, at least, one’s software) needs to take into account:

  1. One must consider what abbreviations or contractions, alternate spellings or transliterations and foreign translations, of the selected tokens might also occur – and account for them. “Bank” should match “Bk” and “Banca,” while “Mohammed” must also match “Muhd” and “Muhamad” (among many others).
  2. One must consider what is considered a person’s last name.
    • For Hispanic names, the first last name is the father’s, which will always be used. For Portuguese names, it’s the second last name. If you wish to also search for the “optional” names, realize that, at best, you will generate multiple matches when you could have generated one (e.g. “Maria Elda Rodriguez Pulido” will match “Rodriguez Pulido” as well as “Maria Rodriguez”) and may also generate matches to obvious false positives (“Rodriguez Pulido” will also match “Jose Rodriguez Pulido”, for example).
    • For Arabic names, there are multiple name components that are neither considered “last” or “midde”. The one that, according to an acquaintance of Mr. Watchlist’s from a bank in the UAE, is the closest to a “last” name is the nisbah, which is a tribal or geographic name (e.g. “Al-Tikriti” or “of Tikirit”, was Saddam Hussein’s nisbah). And the kicker is that the only “required” name component is the first name – so you may not even get a nisbah.
  3. One must consider how to handle name components that consist of multiple tokens. For “Juan M. de la Cruz”, does the software drop “de” and “la” (meaning “of” and “the”) and match only on “Cruz”, or does it require all three? Similarly, “Abdul” in non-Americanized Arabic names, is always followed by one of the 99 adjectives for Allah. Does the software treat the pair of tokens as a single name component? As an aside, the “Abdul Rahman” on the SDN list is really the equivalent of listing someone purely as “Barney.”

Of course, the elephant in the room remains to be named: unless you are willing to endure great expense, you will not catch everything. Normally a listing for “The Ford Motor Company” would match on something like “Ford Motor”, and possibly abbreviations like “FoMoCo” and “FMC”. But you could get a reference like “Ford” – do you really want to stop all references to that? In a more tangible sense, there were Burmese companies called “New York” and “Hong Kong” – someone, somewhere (either you or your data provider) had to make an assumption or shortcut to prevent the deluge of matches.

So, that really is the last issue to be considered: how does a data provider deal with terms that are too generic to be a sole matching criteria? There are entity names, aliases or acronyms of “So”, “Maria”, “SRA”, “BOTH”, “Am”, and “55”, not to mention the examples above. Does the data provider drop these, combine these with another criteria (e.g. “Hong Kong Yangon” because the company is located in Yangon) or leave them as-is? And how can you modify the handling of these should you wish to?

Ultimately, you may not get a perfectly-tuned set of data. However, if you at least understand the reasoning behind what it provides, you can better strategize your matching and operational strategies going forward.

Reference:

Arabic nomenclature: A summary guide for beginners

Categories: Matching Technologies

eric9to5

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: