The rise of technology and interconnectedness across the globe has coincided with an exponential increase in data generated over the last few decades. At the enterprise level, unstructured data is currently growing 50% year on year and accounts for approximately 80% of all data stored.
This is probably unsurprising given unstructured data is generally in text or image form and represents how humans best communicate, but it also brings its own challenges for effective information storage and accessibility. Companies are always trying to stay ahead of the curve to ensure data can be leveraged as their primary asset in an increasingly data-centric economy, leading to an ever-evolving technology landscape and opening up new data driven capabilities.
Clarity from chaos
An interesting aspect of this growing abundance of data is the corresponding rise in news or journalistic services, made possible by the expanding reach of the internet and emergence of new websites open to public discourse. This has led to a huge volume of potential sources of news across many jurisdictions and languages. Being able to search such a treasure trove of information in seconds with the internet is an amazing resource, opening up the ability to quickly research news and events at any time.
In the world of Financial Crime, this resource is actively being used by Financial Institutions (FIs) to screen clients for associated risks and identify criminal behaviour. This is also known as Adverse Media Screening and usually involves searching for an entity against certain keywords encapsulating risk. However, with the vast volumes of information at any FIs disposal, unearthing crucial news articles can often be derailed from information overload. Quite simply, there is too much data for a human to reasonably analyse to make an informed decision with a constrained time budget.
The use of adverse media screening is an emergent trend for tackling financial crime within FIs and is applicable across many important activities for risk management in the sector. For example, a bank may want to open a business account for a new client. To effectively identify potential risk, they perform an adverse media screen at the point of onboarding. If no results are found the client can be successfully onboarded, but if a deluge of adverse news articles are found, it suggests the client poses a material risk to the bank and further action is required.
The screening process can also be applicable to existing client bases to identify potential changes in behaviour or periodic risks, which could be especially important for high risk clients. This shift to client monitoring, instead of solely relying on transactional monitoring, is an important step-change in risk management for FIs, as it embraces a more holistic customer view accounting for external behaviours that may not have previously been considered. As useful as this sounds, the reality of manually screening clients for relevant adverse news articles across the internet is completely infeasible. A human analyst cannot sift through the deluge of information from all sources of information, potentially across many jurisdictions and languages, in a reasonable timeframe. Quite often, an analyst within an FI performing screening will have a few minutes to review documents and make a decision. An internet search with adverse term keywords can return hundreds to millions of documents depending on the query, where search results can change over time as some articles are removed for various reasons. All of this presents a dizzying, insurmountable challenge for adverse screening at the manual, human level. If we approach the problem using advanced technology, we can tip the scales in our favour and bring clarity from chaos.
At Arachnys, we recognise the inherent value in data and have built up a considerable archive of news articles from around the world over the last decade. We have over 1 billion documents across more than 50 languages and jurisdictions stored in our news cluster, which is increasing week on week.
This source of information is invaluable from an adverse screening perspective as it is incredibly broad and persistent, meaning we have instant access to hard-to-find information from across the globe. Simply serving up this information to an analyst with adverse keyword searching would be a step in the right direction, as the technology our cluster is built on would remove some of the noise compared to public internet searches.
We would not be unlocking the true potential of the data if we stopped there though. By leveraging advanced Machine Learning (ML) techniques and recent advances in Natural Language Processing (NLP) and Understanding, we are in the position to deliver next generation adverse media screening capabilities, offering improved analyst efficiencies and enhanced financial crime defences in the process.
Natural Language Processing is the interpretation of unstructured text information by a machine. This information can be from any source, such as a news article or an email, as long as it is in text form. Although NLP has been around for some time, in recent years the techniques and performance of NLP language models has radically improved. In short, we are at a point in time where a machine can effectively interpret and extract information from documents. This is incredibly powerful as it opens the door to significant efficiency gains for intelligent document analysis, an area where humans typically dominate.
Our ultimate goal for adverse media screening is to return a precise collection of news items given a query, which could be an entity name and a set of keywords that describe risk categories of interest. This output collection of news items would be highly relevant and appropriately scored with respect to risk, a product of intelligent document analysis using NLP and ML models. Some of the tasks and capabilities we are working on to achieve this goal are outlined below:
- Deduplication – Newsworthy events are often reported multiple times and ripple through the internet. For high profile entities, the cascade of news articles with the same content can be overwhelmingly large. It is undesirable and inefficient for a human analyst to sift through copies of the same document, even if it is relevant for adverse screening. We have our own deduplication method to remove this noise from search results, showing only materially different articles to users.
- Entity Extraction & Grouping – We are primarily interested in how risk and adverse terms relate to entities, where an entity can be a person or an organisation. Thus, being able to identify, extract and sensibly group entities into a collection is very useful. It allows us to associate and identify patterns for the correct entity groups, which is crucial for delivering good search relevance. For a user, we can also highlight entities contained within an article at a glance which can be valuable.
- Topic Modelling – Articles can be analysed by an ML model to discover the prevailing topics within. This allows documents to be clustered or ordered into specific risk categories, making it easier for a human analyst to review. Furthermore, the identification of certain risk topics plays a role in improving search relevance.
- Adverse Entity Detection – Previously, it was often true that a real adverse-entity connection could only be established via a human reading a document in full, as there is a nuance to language that machines struggle to pick-up. Now with advanced NLP techniques capable of identifying complex patterns in language, we are much more effective at determining true adverse term connections to entities in articles. This is advantageous for improving search relevance as we can efficiently remove irrelevant documents from search results and confidently score news articles from a risk perspective.
Overall, Arachnys are well positioned to offer Adverse Media Screening services where broad data sources and search relevancy is imperative. Through the application of data science methods, the true value of our extensive data volume is being realised and our intelligent data capabilities will continue to be developed in this area. Users employing our adverse screening solution can avoid information overload and enjoy the benefits of advanced NLP to help provide relevant information as required.