Hackathon data challenge

B-Hive Fintech Hackathon

Anomaly detection

We are happy to assist B-Hive and its partners during the Hackathon by supplying interesting datasets for fraud detection. Below you will find three different datasets. The first two sets allow for supervised training by using the labelled training set. The third set is fully anonymised and blind, meaning that you will have to define and identify the anomalous entries independently.

Please send your responses to hackathon@yields.io. It is sufficient to simply send one vector per dataset, identifying those rows that contain suspicious entries (either a boolean flag or - if possible - a probability indicating the likelihood that the sample is a fraud). If possible, also mention why the entries are suspicious.

Insurance fraud

Data can be found here.

Transactions fraud

Data can be found here.

The meaning of the headers is:

  • time: maps a unit of time in the real world. In this case 1 step is 1 hour of time. Total steps 744 (30 days).
  • amount: amount of the transaction in local currency.
  • nameOrig: customer who started the transaction
  • oldbalanceOrg: initial balance before the transaction
  • newbalanceOrig: new balance after the transaction
  • nameDest: customer who is the recipient of the transaction
  • oldbalanceDest: initial balance recipient before the transaction. Note that there is no information for customers that start with M (Merchants).
  • newbalanceDest: new balance recipient after the transaction. Note that there is no information for customers that start with M (Merchants).
  • isFraud: This is the transactions made by the fraudulent agents inside the simulation. In this specific dataset the fraudulent behavior of the agents aims to profit by taking control or customers accounts and try to empty the funds by transferring to another account and then cashing out of the system.
  • isFlaggedFraud: The business model aims to control massive transfers from one account to another and flags illegal attempts. An illegal attempt in this dataset is an attempt to transfer more than 200.000 in a single transaction.

    Blind anomaly detection

    Data can be found here.

    The headers are:

  • Time: Number of seconds elapsed between this transaction and the first transaction in the dataset
  • V1 - V28: anonymous attributes
  • Amount: Transaction amount

    Other datasets

    Below are additional datasets that you can use to showcase more functionality of your solution:

  • Additional insurance claims with detected fraud
  • Motor vehicle crashes
  • UK Accidents
  • European accident forms
  • e-mails
  • NHTSA crash viewer

KBC innovation fair

Yields.io was present at the KBC innovation fair organized by Surf Studio.


Yields has been selected in the 2017 IMEC.iStart accelerator program!

Interested in a demo?

Lorem ipsum dolorem et arceopara bellum. Lorem ipsum dolorem et arceopara bellum. Lorem ipsum dolorem et arceopara bellum.