Anomaly Detection in Large Datasets
Anomaly detection is the task of finding instances in a dataset which are different from the norm. Today, anomaly detection is a core part of many data mining applications, for example in network intrusion detection, fraud detection, data leakage prevention, the identification of failures in complex systems, and diagnosis in the medical domain. In all of these applications, the amount of stored data has increased dramatically in the last decade, resulting in a strong demand for algorithms suitable for these crucial large-scale challenges. The main goal of this thesis is to bridge this gap and to provide efficient anomaly detection algorithms which are significantly faster than existing methods. A broad spectrum of algorithms is proposed, covering the trade-off between efficiency and accuracy for both unsupervised and semi-supervised anomaly detection. All presented methods reduce the overall computational complexity such that it is now possible to process large-scale datasets within an appropriate runtime. Far beyond what is achievable with state-of-the-art methods today, the technology presented opens new fields of application and research.