Every year, credit card fraud causes massive losses for banks, businesses and their customers, and prevention is a constant race between programmers and criminals. With the rise in popularity of online shopping, we have seen a steep increase in so-called “card-not-present” fraud, where credit card data (including security code) is stolen and used without the physical card ever leaving the owner’s wallet.
To counter this, a number of companies are implementing additional checks in their payment processes, like sending one-time passwords to shoppers’ phones, but the extra steps can be a deterrent for potential customers, which hurts the store’s bottom line. The other method of prevention is to identify and flag transactions that do not align with the card owner’s previous habits.
This article showcases a novel approach to preventing credit card fraud. RDFox and Data Lens have partnered up to demonstrate their unique capabilities which can be utilised to transform and analyse credit card data and prevent fraud.
RDFox is a knowledge graph and semantic reasoning engine. As a highly optimised in-memory solution, RDFox allows us to work with very large data sets without sacrificing speed. The flexible triplestore structure and incremental reasoning make it the perfect technology for this use case. Utilising its unique rules system we can ensure that our calculations take any new input into account.
We use Data Lens to make building knowledge graphs in RDFox much simpler and faster. The Data Lens platform can build knowledge graphs from any source database or data format. With no engineering, just configuration.
Suppose we are a credit card provider and collect a variety of data about our UK customers and their transactions. For each card we issue, we keep track of who the owner is and what other cards they have. When the customer makes a transaction, we check what country and city it took place in, what device was used and what vendor was involved. Soon we amass enough information to start assessing whether given behaviour is unusual for a given card-holder.
To assess whether given behaviour is unusual, we need to analyse the data. At present the data exists in two separate, disparate data sources. One is a CSV file, and another is a JSON file. We use Data Lens to read these sources of CSV and JSON data, transform the data into RDF, and then insert the data into RDFox, by following these steps:
Data Lens uses RML to configure the mapping between source data (in this case CSV or JSON) and target data (RDF) formats. RML (RDF mapping language) is a language for expressing customised mappings from heterogeneous data structures and serializations to the RDF data model.
Now that we have turned it into RDF, linked it, and inserted the data into RDFox using Data Lens, our data looks like this:
Using reasoning, we can compute a risk score for each transaction. If the transaction is above a specified threshold, then it will be flagged for further investigation.
For the purposes of this article, we have chosen four factors that will contribute to a transaction’s overall score: device use history, amount transferred, vendor trustworthiness and location plausibility. To compute the value for each factor we use RDFox rules, which are more efficient than INSERT queries as when new triples are added to the store, rules are triggered automatically and evaluated incrementally (that means that we do not waste time re-deriving previously inferred triples).
First, to make things just a bit easier for ourselves, we add a direct link from a transaction to the person who made it, as well as to the previous transaction made by that person. This will make other rules simpler to write and quicker to materialise.
RDFox uses a powerful extension of the declarative rule language Datalog. An example rule that creates the link described above would look like this:
Each time a device is used we save important information, such as, whether this device has previously been used to make payments from a given account, whether the device has been used within the UK. For every transaction, we also calculate the average amount the person spent in one go during the previous month.
Then we determine the risk scores for all the factors using the following naive approach:
All of this can be achieved using RDFox’s advanced reasoning engine. Finally, we sum all the risk factor values for a given transaction and save that as its total risk. We ensure that the rule for that works even if not all the factors are present. This will help us avoid failure if we work with incomplete data.
We can use the RDFox web console and its exploration feature to more closely examine the data. Let us take a look at the transaction with the highest total risk score:
The most noticeable thing about this data is that the person’s previous transaction happened in the United States not even 20 minutes before and now they are in Germany, which is not possible. We can safely assume that this transaction is fraudulent.
Since this is an artificially generated data set, we happen to know exactly which ones we are looking for — there are a total of 43 fraudulent transactions. We pick 15 as a reasonable risk threshold and all transactions with a score higher than that will be flagged. Only one legitimate transaction has a total risk score of over 15, so if we chose that as our threshold, we would get a false positive rate of less than 7%.
Moreover, if we look at that one legitimate transaction, we find that it follows directly after a fraudulent one (hence its high risk score caused by the impossible speed of travel required to get from where the previous transaction was made to where this one took place), so all the transactions we flag in this way are tied to accounts that have recently been compromised.
By transforming disparate data sources into RDF using Data Lens, gathering that information in a graph database and utilising RDFox’s powerful reasoning capabilities, we can detect and flag fraudulent credit card transactions based on complex relationships in our data. With the right rules and risk score definitions (these could be fine-tuned using machine learning) we can greatly reduce the losses caused by this type of crime. RDFox is perfectly suited for this type of application because its unmatched speed allows it to effortlessly process large amounts of information and the reasoning engine ensures we do not have to worry about keeping our data consistent.