Institute of Computing Technology, Chinese Academy IR
SAHA: A String Adaptive Hash Table for Analytical Databases | |
Zheng, Tianqi1,2; Zhang, Zhibin1; Cheng, Xueqi1,2 | |
2020-03-01 | |
发表期刊 | APPLIED SCIENCES-BASEL |
卷号 | 10期号:6页码:18 |
摘要 | Hash tables are the fundamental data structure for analytical database workloads, such as aggregation, joining, set filtering and records deduplication. The performance aspects of hash tables differ drastically with respect to what kind of data are being processed or how many inserts, lookups and deletes are constructed. In this paper, we address some common use cases of hash tables: aggregating and joining over arbitrary string data. We designed a new hash table, SAHA, which is tightly integrated with modern analytical databases and optimized for string data with the following advantages: (1) it inlines short strings and saves hash values for long strings only; (2) it uses special memory loading techniques to do quick dispatching and hashing computations; and (3) it utilizes vectorized processing to batch hashing operations. Our evaluation results reveal that SAHA outperforms state-of-the-art hash tables by one to five times in analytical workloads, including Google's SwissTable and Facebook's F14Table. It has been merged into the ClickHouse database and shows promising results in production. |
关键词 | hash table analytical database string data |
DOI | 10.3390/app10061915 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | Strategic Priority Research Program of the Chinese Academy of Sciences[XDA19020400] |
WOS研究方向 | Chemistry ; Engineering ; Materials Science ; Physics |
WOS类目 | Chemistry, Multidisciplinary ; Engineering, Multidisciplinary ; Materials Science, Multidisciplinary ; Physics, Applied |
WOS记录号 | WOS:000529252800016 |
出版者 | MDPI |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/15030 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Zheng, Tianqi |
作者单位 | 1.Chinese Acad Sci, Inst Comp Technol, CAS Key Lab Network Data Sci & Technol, Beijing 100190, Peoples R China 2.Univ Chinese Acad Sci, Beijing 100049, Peoples R China |
推荐引用方式 GB/T 7714 | Zheng, Tianqi,Zhang, Zhibin,Cheng, Xueqi. SAHA: A String Adaptive Hash Table for Analytical Databases[J]. APPLIED SCIENCES-BASEL,2020,10(6):18. |
APA | Zheng, Tianqi,Zhang, Zhibin,&Cheng, Xueqi.(2020).SAHA: A String Adaptive Hash Table for Analytical Databases.APPLIED SCIENCES-BASEL,10(6),18. |
MLA | Zheng, Tianqi,et al."SAHA: A String Adaptive Hash Table for Analytical Databases".APPLIED SCIENCES-BASEL 10.6(2020):18. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论