JoWUA

Volume 15 - Issue 4

Efficient Outlier Detection in High-Dimensional Data Using Unsupervised Machine Learning

Girish Reddy Ginni Department of CSE, GITAM University, Gandhi Nagar, Rushikonda, Visakhapatnam, Andhra Pradesh, India.
girishloshankar@gmail.com 0009-0005-5242-8839
Dr. Srinivasa L. Chakravarthy Department of CSE, GITAM University, Gandhi Nagar, Rushikonda, Visakhapatnam, Andhra Pradesh, India.
chakri.ls@gmail.com 0000-0001-9141-4863

DOI: 10.58346/JOWUA.2024.I4.013

Keywords: Outlier Detection, Clustering, Unsupervised Learning, Machine Learning, High Dimensional Data.

Abstract

A fundamental concept in data mining and ML is outlier detection. Outlier identification and clustering often work together, as identifying outliers can lead to better clustering. Most current research projects have focused primarily on outlier identification and clustering as separate aspects, but their close relationship needs to be explored. By considering this relationship, we can improve cluster quality while detecting outliers, providing dual benefits. We have proposed an unsupervised ML framework for efficiently detecting outliers in high-dimensional datasets. An objective function has been defined to enhance cluster compactness, which improves the outlier detection process. By improving the clustering process through problem transformation and enhanced K-Means, we can develop an integrated approach that achieves high-quality clustering and outlier identification simultaneously. We have introduced an algorithm called Learning-based Outlier Detection (LbOD), which is novel in its simultaneous approach to partition space, objective function, and cluster optimization. A prototype has been built to evaluate the proposed framework and algorithm's ability to discover outliers using multiple benchmark high-dimensional datasets. Our empirical study has shown that the LbOD algorithm outperforms many existing outlier detection methods.

Date

2024

Page Number

192-212