Anomaly Detection, also known as outlier detection, is about identifying the not-normal items or events in a dataset.

anom1

Here, the red fish is not normal amidst the blue fish.

But, why does detecting anomalies matter?

Suppose we are studying about the fish in above image. They belong to the same species, but can be either blue or red in color. The color of the fish does not affect the study. Detecting the red fish (anomalies) in our fish data is not important for our study.

However, if we were studying fruit, we would need to remove the spoiled fruit data (anomalies) before using the data for our study.

anom2

Photo from 101 Clip Art

The history of anomaly detection goes back to 1970s, when data mining scientists were interested in anomalies because they wanted to remove them from the dataset. Anomalies, or outliers, introduced noise into the dataset, making the training of models a difficult task.

The spoiled fruit in above example introduce noise into the dataset. Once removed, they are not considered in the study.

Around the year 2000, researchers started to get interested in anomalies themselves. They recognized that the presence of anomalies in a dataset is often related to interesting or suspicious events. Since then, several data mining techniques were developed focusing on detecting anomalies in a dataset. There are various such applications where anomaly detection is used to discover hidden occurrences.

anom3

Intrusion Detection is one of the most well-known applications of anomaly detection. If someone is attempting to attack or gain unauthorized access to a network, it can be identified by detecting not-normal accesses to a network.

anom4

Fraud Detection, specially credit card frauds or fraudulent financial activities can be identified by detecting transactions that deviate from the usual pattern.

anom5

Patient Monitoring systems utilize anomaly detection techniques to identify existence of a disease or critical illnesses within patients, using their records.

anom6

Fault Detection in software systems makes use of anomaly detection to recognize instances that differ from the normal behavior of the system. As such instances are often resulted by a faulty condition in the system, they are used to identify faults.

References

Goldsteing, Markus and Seiichi Uchida. "A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data." PloS one 11.4 (2016)