Fraud Detection

Techniques & Applications

Dr. Sasha Göbbels
http://slides.technologyscout.net/fraud-detection/
With commerce comes fraud.
Nathan Blecharczyk, AirBNB

Overview

  • Definitions: what is fraud and how does it work?
  • What methods are available?
  • What do the operational scenarios look like?
  • What is the optimal approach?

Definitions

What is fraud?

Fraud is deliberate deception to secure unfair or unlawful gain, or to deprive a victim of a legal right.
Wikipedia

What is fraud detection?

  • Areas of application
    • Transactions in online banking and creditcards
    • Claims with insurances and warrenty
    • Call records with telco providers
  • Data is fed in parallel into live system and fraud detection system
  • When the alarm goes:
    • Transactions are denied or put on hold
    • Claims are marked for manual inspection

Methods

Methods of fraud detection

1. Rule based systems
2. Graph based systems
3. Expert systems
4. Deep learning

1. Rule based systems

Overview
  • All transactions (Tx) run through one or several workflows
  • Every step reviews some detail of the Tx and may include aggregated data
  • Result: fraud score

1. Rule based systems

Background
  • Components:
    • in-memory database
    • rules engine
  • Data:
    • detail data of each transaction
    • aggregated data (e.g. average monthly volume of transactions of creditcard)

1. Rule based systems

Pros & cons
Pro Contra
  • long established and proven model
  • extremely fast
  • result comprehensible for humans by listing of activated rules
  • requires domain knowledge experts
  • unable to grasp certain scenarios
  • needs to be taken care of regularly
  • "Human intelligence based"

2. Graph based systems

Overview
  • transactions are parsed into nodes and edges
  • too many connections to one node may indicate fraud
  • certain graph parameters can indicate fraud

2. Graph based systems

Background
  • Components:
    • graph based or relational database
    • data mining algorithms
    • visualization
  • mathematical foundation: graph theory

2. Graph based systems

Pros & cons
Pro Contra
  • finds unusual or hidden scenarios (spiderweb, circular cash flow)
  • well comprehensible due to visualization
  • requires specific database designs
  • most suited for data sets with lots of details
  • data volume:
    1 Tx → n nodes, n-1 edges (n=5-20)

3. Expert systems

Overview
  • Apply use cases and domain knowledge ("knowledge engineering")
  • Workflow:
    1. Plan: plan possible solution candidates
    2. Generate: generate solution candidates
    3. Test: see if candidates solve problem

3. Expert systems

Background
  • developed to elucidate mass spectra in 1965
  • most famous system: DENDRAL (DENDRitic ALgorithm)
    • Heuristic DENDRAL
    • MetaDENDRAL

3. Expert systems

Pros & cons
Pro Contra
  • can learn new scenarios
  • well researched technology
  • slow to really, really slow
  • learning progress is created through feedback of meta data into the heuristics (→ manual interception)

4. Deep learning systems

Overview
  • transactions run through network of layers of nodes
  • specialized input and output node layers
  • in between "hidden" layers of processing nodes

4. Deep learning systems

Perceptron
  • point (x,y)
    • below line: red
    • above line: blue
  • Use training data to optimize weights w to minimize output error

4. Deep learning systems

Many perceptron
  • every weight w´ has analog to single perceptron
  • in this example:
    • 3 input values/nodes
    • 2 output values/nodes
    • 1 hidden layer

4. Deep learning systems

Activation functions f(x)
A linear combination of linear functions f(x) still results in a linear function

Back door: use non linear functions!

4. Deep learning systems

Activation functions f(x)

signoid/logistic function

hyperbolic tangens

Heaviside function

rectifier/softplus function

4. Deep learning systems

Training is everything!
  • training by minimizing errors (least squares)

    y: real output
    t: expectation value (target)
  • manipulation of weights (stochastic gradient descent)
  • solution: back propagation

4. Deep learning systems

Pros & cons
Pro Contra
  • finds scenarios you not even thought about
  • can also detect complex scenarios
  • no comprehensible feedback on why a transaction was categorized
  • can be very slow
  • problems: vanishing gradients, overfitting

Operational scenarios

Example 1: Social security fraud in Belgium
How does it work?
  • a key company creates satellite companies, that make money
  • when social security payments are due, the satellite becomes insolvent
  • resources (workers, offices, vehicles) are transferred to next satellite
Example 1: Social security fraud in Belgium
The problem
  • around 250.000 active companies in Belgium in 2012
  • in the long run 25% of them will become insolvent
  • only very few of those are fraudulent
  • aim: detect critical cases before they become broke
Example 1: Social security fraud in Belgium
The solution
  • graph theory: ego networks → elimination of inconspicuous companies
  • training data: enrichment of fraudulent cases by SMOTE (synthetic minority oversampling technique)
  • 2 data scenarios:
    • basic (only local information of the node)
    • relational (plus infos of resources from the ego net)
  • remainder goes into neural network
    • random forest
    • naive Bayes
    • logistic regression
Example 1: Social security fraud in Belgium
Results
  • random forest delivers best results
  • AUC (area under curve) ROC (receiver operating characteristic) selectivity between fraud and non-fraud: 85-88%
  • important: temporal analysis after 6, 12 or 24 months. ROC AUC goes down. True positives go up.
Example 2: Fraud in mobile networks
How does it work?
Typical example:
 
  • fraudster signs subscription with mobile carrier
  • fraudster sells usage of his subscription to others for long distance calls
  • fraudster vanishes when payment is due
Example 2: Fraud in mobile networks
The solution
  • define scenarios
  • extract indicators for fraud from scenarios
  • accumulated data from CDR (call detail record):
    • IMSI (international mobile subscriber ID)
    • start of call and duration
    • number called
    • type of call (national/international)
Example 2: Fraud in mobile networks
Details
  • what for one account is an "atypical usage" is quite normal for another one
  • solution: differential analysis per account via user profile history (UPH) and curent user profile (CUP): $U_{now} = (1 - \alpha) UPH_{old} + \alpha CUP$
  • goes into:
    • rule based white box system
    • supervised neural network (multilayer perceptron with 1 hidden layer, logistic-sigmoidal activation function)
    • 2 unsupervised neural networks (A-numbers: user profile; B-number: monitoring target country of call)
  • combination of all 4 alarm function/fraud scores
Example 2: Fraud in mobile networks
Results
  • AUC ROC selectivity for test data: 87.2%
  • AUC ROC selectivity for real data: 85.6%

Optimal approach?

The future is bright and complex

  • parallel circuit: combination of several detection methods can lead to better results
  • series connection: elimination of unsuspicious cases via method 1, scoring with method 2
  • derivation: generate rules with method 1, application and scoring with method 2

Dr. Sasha Göbbels

TechnologyScout
Innovation management
Fraud detection
Team management
eCommerce consulting

References

  • W. McCulloch, W. Pitts, „A Logical Calculus of the Ideas Immanent in Nervous Activity“, Bulletin of Mathematical Biophysics, Vol. 5 (1943), pp. 115-133
  • A. Rosenblueth, N. Wiener and J. Bigelow, „Behavior, Purpose and Teleology“, Philosophy of Science, Vol. 10, No. 1 (Jan., 1943), pp. 18-24
  • V. Van Vlasselaer, B. Baesens, et. al., „Using Social Network Knowledge for Detecting Spider Constructions in Social Security Fraud“, ASONAM’13 (2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining), pp. 813-820
  • N. V. Chawla, K. W. Bowyer, Lawrence O. Hall, W. Ph. Kegelmeyer, „SMOTE: Synthetic Minority Over-sampling Technique“, Journal of Artificial Intelligence Research, Vol. 16 (2002) pp. 321–357
  • H. Verrelst, E. Lerouge, Y. Moreau, J. Vandewalle, Chr. Störmann, P. Burge, „A rule based and neural network system for fraud detection in mobile communications“, European project “Advanced Security for Personal Communication Technologies” (ASPeCT)
  • T. Fawcett, F. Provost, „Adaptive Fraud Detection“, Data Mining and Knowledge Discovery, Vol. 1 (1997), pp. 291–316