Efficient Outlier Detection in High-Dimensional Data Using Unsupervised Machine Learning
Girish Reddy GinniDepartment of CSE, GITAM University, Gandhi Nagar, Rushikonda, Visakhapatnam, Andhra Pradesh, India. girishloshankar@gmail.com0009-0005-5242-8839
Dr. Srinivasa L. ChakravarthyDepartment of CSE, GITAM University, Gandhi Nagar, Rushikonda, Visakhapatnam, Andhra Pradesh, India. chakri.ls@gmail.com0000-0001-9141-4863
A fundamental concept in data mining and ML is outlier detection. Outlier identification and clustering often work together, as identifying outliers can lead to better clustering. Most current research projects have focused primarily on outlier identification and clustering as separate aspects, but their close relationship needs to be explored. By considering this relationship, we can improve cluster quality while detecting outliers, providing dual benefits. We have proposed an unsupervised ML framework for efficiently detecting outliers in high-dimensional datasets. An objective function has been defined to enhance cluster compactness, which improves the outlier detection process. By improving the clustering process through problem transformation and enhanced K-Means, we can develop an integrated approach that achieves high-quality clustering and outlier identification simultaneously. We have introduced an algorithm called Learning-based Outlier Detection (LbOD), which is novel in its simultaneous approach to partition space, objective function, and cluster optimization. A prototype has been built to evaluate the proposed framework and algorithm's ability to discover outliers using multiple benchmark high-dimensional datasets. Our empirical study has shown that the LbOD algorithm outperforms many existing outlier detection methods.