Research on Arbitrarily Shaped Clustering


Arbitrary shaped clustering is an important research topic in the domain of data mining. Most of the existing approaches cannot ensure good clustering results and scalability on large high-dimensional datasets at the same time. To meet the clustering requirement on large high-dimensional datasets, some researchers proposed that we could extract the backbone of each cluster before conducting clustering on the backbone. In this way, we can reduce time to some extent, however the time reduced depends on the proportion of representatives forming the backstone to the whole dataset, which varies with the dataset. In this paper, we put forward an efficient clustering method based on representatives sampling and boundary similarity. It includes three steps: firstly, conduct representatives sampling, which makes represemtatives distribute evenly and continuously; secondly, adjust position of representatives iteratively to make them get closer to corresponding k nearest neighbors; finally, conduct agglomerative clustering based on boundary similarity. We conduct extensive experiments on synthetic and real-world datasets, and contrast experiments with other methods. The experimental results prove the validity and efficiency of our method.