In the previous post, we laid out how firms use repeated patterns in their data and/or patterns in the matched watchlist listing to ignore matches. The upside of those strategies is that the risk, if any, is very contained if not totally eliminated. The downside is that you tend to have to take a lot of axe swings to fell the tree; there generally are comparatively few repetitive data patterns that occur very frequently, so firms have to make numerous rules-based configuration changes to get the 50-75% reduction in matches they want.
On the other hand, we can create more general rules in how we process our data, at the cost of increased risk. Whether you feel you can accept this risk is a function of your risk posture.
Let’s get specific by laying out some strategies Mr. Watchlist has recommended over the years to my clients. Some are more risky than others, and are necessitated by the sheer numbers involved and/or data quality issues that have come to light.
If you’re screening static data, you’ve undoubtedly supplied your data in discrete fields each with a specific purpose, like name, address, date of birth, etc. And that means, assuming you have decent data quality, that only certain types of matches should occur in certain fields. Normally, cities listed on watchlists should only match the field you store the city information, for example.
So, you could configure your application so that matches that shouldn’t occur don’t – like a match to an organization name, like ANA, to your city field. That way, you don’t get Santa Ana matching ANA.
Now, what do you do if your data quality is not so hot? You have a number of options:
- If the number of nonconforming records is really small, you could choose to screen based on the data quality of the properly-composed records
- If the number of “bad” records isn’t so small, or if you’re uncomfortable with the first strategy, you could choose to Identify the non-conforming records and screen them differently
- Another option is to remediate the aberrant records, moving to the desired screening parameters at any time you’re comfortable with any potential risks
Incomplete individual names
There are a pair of cases that make up this category.
First, some individuals on the watchlists also have nicknames or aliases listed. These are largely drug traffickers (e.g. Jose Luis, Maria) and Terrorists (e.g. Tariq, Khalid, Renato).
Second, there are some folks who are only identified either by first or last name. There are a bunch of people on the Central Bureau of Investigation Most Wanted list in India who only have one name listed – and these are very common given names, like Mahesh, Rajesh, and Dev. There are a handful of others on others lists, too – like Abdul Rahman (that’s only one name – Abdul always requires a second name component), the military officer Goodrich, and the Burmese officer’s wife named Cherry, for example.
Wading through every reference to these is, at best, tiresome. A good rule of thumb for the ratio between the false positives and true matches for a given matching phrase is: the more tokens in the matching phrase, the lower the number of false positives per true match. Therefore, single word matching phrases generate much higher numbers of false positives per true match, making the cost/benefit ratio less favorable than for other listed entities.
Consider, too, that someone hiding behind an alias is highly unlikely to provide any additional information to identify themselves as the listed person.
So, what to do? Personally, I’d ignore these all. OFAC, by the way, lists many of the aliases as “weak aliases” and provides guidance that, while you could use the match as a secondary matching criteria, you don’t have to make it the primary thing you match on.
Is there a risk here? Absolutely. If you are troubled by the risk, perhaps the exclusion of these items could be done by value of the relationship (for static information) or transaction value – excluding the low end, of course.
Same strategy, different breed of animal. Terror organizations and companies on watchlists are often listed by their acronyms as well. Another rule of thumb is, the shorter a token/word, the more likely you are to get false positive matches to it – and the more likely it is to have a secondary, legitimate meaning:
- SL (short for Sendero Luminoso, the Shining Path rebels in Peru) is a common suffix for small businesses in Spanish-speaking areas. It’s like LLC, I believe
- SRA (Sanibel Relief Agency) is also a way to abbreviate Senora (Mrs.)
- MME (Metals Machine Engineering, a company in Germany) is an abbreviation for Madame (Mrs.)
- EGP (a terrorist group) is also the ISO currency code for the Egyptian Pound
- ETA (the Basque separatist group) is also a common abbreviation for “estimated time of arrival”
Here, too, I’d ignore these if I could justify the risk. And, if I was scanning static information, I’d require that all corporate names were listed in full.
No, not the movie with John Travolta. Some matching phrases contain multiple words/tokens, one of which is much shorter than the other. On a percentage basis, the shorter token might be not very high quality, but the overall fuzzy score ends up being above the threshold due to the proper spelling of the long tokens (e.g. the matching phrase ATE International matching ARC International). Requiring the shorter token to be properly spelled (if one can’t adjust the minimum score for each token’s matching on a more system-wide basis) is an option to consider.
If screening static data, you could configure your system to enforce name order in certain cases – depending on the name format(s) you use. For example, if your names are in first name, last name order, you might require that a first initial in a matching phrase appear first in the name field – or that a non-Hispanic, non-Arabic last name appear last.
This assumes, of course, that you don’t support both directory-style and non directory-style names in your data – and don’t have multiple names in one field (e.g. joint accounts).
Different strokes for different folks
Not all relationships, or transactions, contain the same risks. Two examples:
- Restricted accounts, such as 401K/403B, 529 and IRA accounts – are not good money laundering vehicles. One should consider not screening these against the PEP lists (and maybe not any other non-sanctions list)
- Retail transactions, in particular low-value items, are less likely to contain certain types of references. For example, they are unlikely to contain references to cargo vessel names, or have DBA (doing business as), C/O (in care of) or Attention information as part of the address information. Therefore, they could, in theory, be screened differently
YMMV (Your Mileage May Vary)
Are any of these bulletproof? No. You have to consider all the factors of your risk profile, including assessing those Enforcement Guidelines General Factors (even if you’re not regulated by OFAC, they make an excellent starting point), and weigh that against the cost of finding alternate strategies for managing these items.
Categories: False Positive Reduction