CSpace  > 中国科学院计算技术研究所期刊论文  > 英文
Performance Evaluation and Optimization of Multi-Dimensional Indexes in Hive
Liu, Yue1,2; Guo, Shuai1,2; Hu, Songlin3; Rabl, Tilmann4; Jacobsen, Hans-Arno4; Li, Jintao1; Wang, Jiye5
2018-09-01
发表期刊IEEE TRANSACTIONS ON SERVICES COMPUTING
ISSN1939-1374
卷号11期号:5页码:835-849
摘要Apache Hive has been widely used for big data processing over large scale clusters by many companies. It provides a declarative query language called HiveQL. The efficiency of filtering out query-irrelevant data from HDFS closely affects the performance of query processing. This is especially true for multi-dimensional, high-selective, and few columns involving queries, which provides sufficient information to reduce the amount of bytes read. Indexing (Compact Index, Aggregate Index, Bitmap Index, DGFIndex, and the index in ORC file) and columnar storage (RCFile, ORC file, and Parquet) are powerful techniques to achieve this. However, it is not trivial to choosing a suitable index and columnar storage based on data and query features. In this paper, we compare the data filtering performance of the above indexes with different columnar storage formats by conducting comprehensive experiments using uniform and skew TPC-H data sets and various multi-dimensional queries, and suggest the best practices of improving multi-dimensional queries in Hive under different conditions.
关键词Hadoop Hive multi-dimensional index performance evaluation
DOI10.1109/TSC.2016.2594778
收录类别SCI
语种英语
资助项目National Natural Science Foundation of China[61070027]
WOS研究方向Computer Science
WOS类目Computer Science, Information Systems ; Computer Science, Software Engineering
WOS记录号WOS:000446719400010
出版者IEEE COMPUTER SOC
引用统计
被引频次:4[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://119.78.100.204/handle/2XEOYT63/4843
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Liu, Yue
作者单位1.Chinese Acad Sci, Inst Comp Technol, Beijing 100049, Peoples R China
2.Univ Chinese Acad Sci, Beijing 100049, Peoples R China
3.Chinese Acad Sci, Inst Informat Engn, Beijing 100049, Peoples R China
4.Univ Toronto, Middleware Syst Res Grp, Toronto, ON M5S, Canada
5.State Grid Corp China, Dept Informat Technol, Beijing 100049, Peoples R China
推荐引用方式
GB/T 7714
Liu, Yue,Guo, Shuai,Hu, Songlin,et al. Performance Evaluation and Optimization of Multi-Dimensional Indexes in Hive[J]. IEEE TRANSACTIONS ON SERVICES COMPUTING,2018,11(5):835-849.
APA Liu, Yue.,Guo, Shuai.,Hu, Songlin.,Rabl, Tilmann.,Jacobsen, Hans-Arno.,...&Wang, Jiye.(2018).Performance Evaluation and Optimization of Multi-Dimensional Indexes in Hive.IEEE TRANSACTIONS ON SERVICES COMPUTING,11(5),835-849.
MLA Liu, Yue,et al."Performance Evaluation and Optimization of Multi-Dimensional Indexes in Hive".IEEE TRANSACTIONS ON SERVICES COMPUTING 11.5(2018):835-849.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Liu, Yue]的文章
[Guo, Shuai]的文章
[Hu, Songlin]的文章
百度学术
百度学术中相似的文章
[Liu, Yue]的文章
[Guo, Shuai]的文章
[Hu, Songlin]的文章
必应学术
必应学术中相似的文章
[Liu, Yue]的文章
[Guo, Shuai]的文章
[Hu, Songlin]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。