Institute of Computing Technology, Chinese Academy IR
SF-Sketch: A Two-Stage Sketch for Data Streams | |
Liu, Lingtong1,2; Shen, Yulong1,2; Yan, Yibo3; Yang, Tong3; Shahzad, Muhammad4; Cui, Bin3; Xie, Gaogang5 | |
2020-10-01 | |
发表期刊 | IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS |
ISSN | 1045-9219 |
卷号 | 31期号:10页码:2263-2276 |
摘要 | Sketches are probabilistic data structures designed for recording frequencies of items in a multi-set. They are widely used in various fields, especially for gathering Internet statistics from distributed data streams in network measurements. In a distributed streaming application with high data rates, a sketch in each monitoring node "fills up" very quickly and then its content is transferred to a remote collector responsible for answering queries. Thus, the size of the contents transferred must be kept as small as possible while meeting the desired accuracy requirement. To obtain significantly higher accuracy while keeping the same update and query speed as the best prior sketches, in this article, we propose a new sketch - the Slim-Fat (SF) sketch. The key idea behind the SF-sketch is to maintain two separate sketches: a larger sketch, the Fat-subsketch, and a smaller sketch, the Slim-subsketch. The Fat-subsketch is used for updating and periodically producing the Slim-subsketch, which is then transferred to the remote collector for answering queries quickly and accurately. We also present the error bound as well as an accurate model of the correct rate of the SF-sketch, and verify their correctness through experiments. We implemented and extensively evaluated the SF-sketch along with several prior sketches. Our results show that when the size of our Slim-subsketch and of the widely used Count-Min (CM) sketch are kept the same, our SF-sketch outperforms the CM-sketch by up to 33.1 times in terms of accuracy (when the ratio of the sizes of the Fat-subsketch and the Slim-subsketch is 16:1). We have made all source codes publicly available at Github ["Source code of SF sketches," [Online]. Available: https://github.com/paper2017/SF-sketch]. |
关键词 | Distributed databases Monitoring Bars Frequency measurement Registers Fats Hash functions Network measurements sketch distributed monitoring multiset frequent items |
DOI | 10.1109/TPDS.2020.2987609 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Key Research and Development Program of China[2018YFE0207600] ; National Key Research and Development Program of China[2018YFB2100403] ; NSFC[61672061] ; NSFC[U1736216] ; National Science Foundation of USA[CNS-1616317] ; National Science Foundation of USA[CNS-1616273] |
WOS研究方向 | Computer Science ; Engineering |
WOS类目 | Computer Science, Theory & Methods ; Engineering, Electrical & Electronic |
WOS记录号 | WOS:000536291500003 |
出版者 | IEEE COMPUTER SOC |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/15297 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Shen, Yulong; Yang, Tong |
作者单位 | 1.Xidian Univ, Shaanxi Key Lab Network & Syst Secur, Xian 710071, Peoples R China 2.Xidian Univ, Sch Comp Sci & Technol, Xian 710071, Peoples R China 3.Peking Univ, Dept Comp & Sci, Beijing 100871, Peoples R China 4.North Carolina State Univ, Dept Comp Sci, Raleigh, NC 27695 USA 5.Chinese Acad Sci, Inst Comp Technol, Beijing 100080, Peoples R China |
推荐引用方式 GB/T 7714 | Liu, Lingtong,Shen, Yulong,Yan, Yibo,et al. SF-Sketch: A Two-Stage Sketch for Data Streams[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,2020,31(10):2263-2276. |
APA | Liu, Lingtong.,Shen, Yulong.,Yan, Yibo.,Yang, Tong.,Shahzad, Muhammad.,...&Xie, Gaogang.(2020).SF-Sketch: A Two-Stage Sketch for Data Streams.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,31(10),2263-2276. |
MLA | Liu, Lingtong,et al."SF-Sketch: A Two-Stage Sketch for Data Streams".IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 31.10(2020):2263-2276. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论