Institute of Computing Technology, Chinese Academy IR
Grid-based DBSCAN: Indexing and inference | |
Boonchoo, Thapana1,2; Ao, Xiang1,2; Liu, Yang1,2; Zhao, Weizhong3,4; Zhuang, Fuzhen1,2; He, Qing1,2 | |
2019-06-01 | |
发表期刊 | PATTERN RECOGNITION |
ISSN | 0031-3203 |
卷号 | 90页码:271-284 |
摘要 | DBSCAN is one of clustering algorithms which can report arbitrarily-shaped clusters and noises without requiring the number of clusters as a parameter (unlike the other clustering algorithms, k-means, for example). Because the running time of DBSCAN has quadratic order of growth, i.e. O(n(2)), research studies on improving its performance have been received a considerable amount of attention for decades. Grid-based DBSCAN is a well-developed algorithm whose complexity is improved to O(nlog n) in 2D space, while requiring Omega(n(4/3)) to solve when dimension >= 3. However, we find that Grid-based DBSCAN suffers from two problems: neighbour explosion and redundancies in merging, which make the algorithms infeasible in high dimensional space. In this paper we first propose a novel algorithm called GDCF which utilizes bitmap indexing to support efficient neighbour grid queries. Second, based on the concept of union-find algorithm we devise a forest-like structure, called cluster forest, to alleviate the redundancies in the merging. Moreover, we find that running the cluster forest in different orders can lead to a different number of merging operations needed to perform in the merging step. We propose to perform the merging step in a uniform random order to optimize the number of merging operations. However, for high-density database, a bottleneck could be occurred, we further propose a low-density-first order to alleviate this bottleneck. The experiments resulted on both real-world and synthetic datasets demonstrate that the proposed algorithm outperforms the state-of-the-art exact/approximate DBSCAN and suggests a good scalability. (C) 2019 Elsevier Ltd. All rights reserved. |
关键词 | Density-based clustering Grid-based DBSCAN Union-find algorithm |
DOI | 10.1016/j.patcog.2019.01.034 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Key Research and Development Program of China[2017YFB1002104] ; National Natural Science Foundation of China[U1811461] ; National Natural Science Foundation of China[61602438] ; National Natural Science Foundation of China[91846113] ; National Natural Science Foundation of China[61573335] ; CCF-Tencent Rhino-Bird Young Faculty Open Research Fund[RAGR20180111] ; Ant Financial through the Ant Financial Science Funds for Security Research |
WOS研究方向 | Computer Science ; Engineering |
WOS类目 | Computer Science, Artificial Intelligence ; Engineering, Electrical & Electronic |
WOS记录号 | WOS:000463130400023 |
出版者 | ELSEVIER SCI LTD |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/4150 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Ao, Xiang |
作者单位 | 1.Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China 2.Univ Chinese Acad Sci, Beijing 100049, Peoples R China 3.Cent China Normal Univ, Sch Comp, Wuhan, Hubei, Peoples R China 4.Cent China Normal Univ, Hubei Key Lab Artificial Intelligence & Smart Lea, Wuhan, Hubei, Peoples R China |
推荐引用方式 GB/T 7714 | Boonchoo, Thapana,Ao, Xiang,Liu, Yang,et al. Grid-based DBSCAN: Indexing and inference[J]. PATTERN RECOGNITION,2019,90:271-284. |
APA | Boonchoo, Thapana,Ao, Xiang,Liu, Yang,Zhao, Weizhong,Zhuang, Fuzhen,&He, Qing.(2019).Grid-based DBSCAN: Indexing and inference.PATTERN RECOGNITION,90,271-284. |
MLA | Boonchoo, Thapana,et al."Grid-based DBSCAN: Indexing and inference".PATTERN RECOGNITION 90(2019):271-284. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论