Institute of Computing Technology, Chinese Academy IR
PR-Sketch: Monitoring Per-key Aggregation of Streaming Data with Nearly Full Accuracy | |
Sheng, Siyuan1; Huang, Qun2; Wang, Sa1; Bao, Yungang1 | |
2021-06-01 | |
发表期刊 | PROCEEDINGS OF THE VLDB ENDOWMENT |
ISSN | 2150-8097 |
卷号 | 14期号:10页码:1783-1796 |
摘要 | Computing per-key aggregation is indispensable in streaming data analysis formulated as two phases, an update phase and a recovery phase. As the size and speed of data streams rise, accurate per-key information is useful in many applications like anomaly detection, attack prevention, and online diagnosis. Even though many algorithms have been proposed for per-key aggregation in stream processing, their accuracy guarantees only cover a small portion of keys. In this paper, we aim to achieve nearly full accuracy with limited resource usage. We follow the line of sketch-based techniques. We observe that existing methods suffer from high errors for most keys. The reason is that they track keys by complicated mechanism in the update phase and simply calculate per-key aggregation from some specific counter in the recovery phase. Therefore, we present PR-Sketch, a novel sketching design to address the two limitations. PR-Sketch builds linear equations between counter values and per-key aggregations to improve accuracy, and records keys in the recovery phase to reduce resource usage in the update phase. We also provide an extension called fast PR-Sketch to improve processing rate further. We derive space complexity, time complexity, and guaranteed error probability for both PR-Sketch and fast PR-Sketch. We conduct trace-driven experiments under 100K keys and 1M items to compare our algorithms with multiple state-of-the-art methods. Results demonstrate the resource efficiency and nearly full accuracy of our algorithms. |
DOI | 10.14778/3467861.3467868 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Key R&D Program of China[2019YFB1802600] ; National Natural Science Foundation of China[61802365] |
WOS研究方向 | Computer Science |
WOS类目 | Computer Science, Information Systems ; Computer Science, Theory & Methods |
WOS记录号 | WOS:000742888800008 |
出版者 | ASSOC COMPUTING MACHINERY |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/18983 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Sheng, Siyuan |
作者单位 | 1.Univ Chinese Acad Sci, SKL Comp Architecture, ICT, CAS, Beijing, Peoples R China 2.Peking Univ, Beijing, Peoples R China |
推荐引用方式 GB/T 7714 | Sheng, Siyuan,Huang, Qun,Wang, Sa,et al. PR-Sketch: Monitoring Per-key Aggregation of Streaming Data with Nearly Full Accuracy[J]. PROCEEDINGS OF THE VLDB ENDOWMENT,2021,14(10):1783-1796. |
APA | Sheng, Siyuan,Huang, Qun,Wang, Sa,&Bao, Yungang.(2021).PR-Sketch: Monitoring Per-key Aggregation of Streaming Data with Nearly Full Accuracy.PROCEEDINGS OF THE VLDB ENDOWMENT,14(10),1783-1796. |
MLA | Sheng, Siyuan,et al."PR-Sketch: Monitoring Per-key Aggregation of Streaming Data with Nearly Full Accuracy".PROCEEDINGS OF THE VLDB ENDOWMENT 14.10(2021):1783-1796. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论