June 25, 2018
Nagesh Adiga published in AI Informed

Resolving Fraud Detection Challenges with Machine Learning

By Nagesh Adiga, Chief Research Scientist at Razorthink, Inc.

There are two reasons fraud detection is so important today. Fraud negatively affects an organization’s bottom line and its reputation. Without suitable detection and prevention, this combination can potentially ruin companies.

When fraud is committed against an e-commerce site, for example, that company is oftentimes responsible for paying for losses—which may erode the value of its merchandise and decrease profit margins. Simultaneously, future customers may hesitate to patronize the site because of its reputation for fraud, as might merchants and third-party partners selling goods.

By using various machine learning techniques, however, organizations can greatly reduce fraud’s impact to resolve this ongoing issue.

Fraud Detection Methods

Most modern fraud detection methods start with a domain expert tasked with two responsibilities. He or she must gather historic transactional data and help with the feature generation process for classic or advanced machine learning models. In this case, features are derived from raw data, and are useful for detecting fraud. As an example, entering zip code incorrectly might indicate a potential fraud. Historic transactional data can assist with feature generation. After the features are generated, they’re used to build a machine learning model for fraud detection.


There are multiple challenges that complicate the fraud detection process.

  • Changing fraud patterns over time: This challenge is one of the toughest because fraudsters are constantly learning new approaches to perpetrate fraud. Thus, the patterns models must detect are ever evolving over time. A common result of this challenge is a decrease in model performance as fraudsters apply new technologies and approaches. Thus, models consistently require updating.
  • Class imbalance: Generally, only a small percentage of customers have fraudulent intentions. Consequently, there’s an imbalance in the classifications of fraud detection models (which usually classify transactions as either fraudulent or non-fraudulent) that makes building them harder. A by-product of this challenge is a poor user experience for non-fraudulent customers, since catching fraudsters usually involves declining some legitimate transactions.
  • Model interpretation: This challenge is associated with the concept of explainability, since models typically give a score indicating that a transaction is likely fraudulent or not—without explaining why.
  • Time consuming feature generation: Subject matter experts can take lengthy time periods to generate features, which can slow the fraud detection process.


There are several measures for resolving these challenges, including:

  • Ensemble Modeling: Ensemble modeling leverages multiple models for a single task (such as fraud detection). Ensembling with classic machine learning, deep learning, and linear models can capture various fraud patterns to maximize outputs. For example, an LSTM (Long Short Term Memory) deep learning model is useful for detecting fraud in a sequence of events. If a user logs in with a new IP address from a different city, changes his street address on file, then purchases an expensive item on an e-commerce site, LSTM might flag this transaction as fraudulent. None of these events alone is indicative of fraud, but the sequence of all three is. Using LSTM models with ensemble modeling can identify different fraudulent patterns over time.
  • Human-in-the-Loop: This technique addresses the classification imbalance issue and the lengthy time for feature detection. It involves humans giving models information to identify new patterns, features, and dimensions of fraud. In the preceding e-commerce use case, for instance, a human could denote that such a sequence was indicative of fraud. The model will then extrapolate this information and apply it to different use cases, such as when users change email addresses instead of physical addresses. Based on human input, the model learns from these examples then identifies more from its own learning.

Reducing Fraud

Each of these resolution techniques can reduce the instance and degree of fraud. They’re foundational for using both classic and advanced machine learning for fraud detection. In the future, challenges for fraud detection will likely expand based on the evolving ways fraudsters commit these illicit acts. Nevertheless, the aforementioned resolutions can ensure organizations’ fraud detection measures evolve as well, decreasing this crime’s impact.