CSpace  > 中国科学院计算技术研究所期刊论文  > 英文
EPAS: A Sampling Based Similarity Identification Algorithm for the Cloud
Zhou, Yongtao1; Deng, Yuhui1,2; Xie, Junjie1; Yang, Laurence T.3
2018-07-01
发表期刊IEEE TRANSACTIONS ON CLOUD COMPUTING
ISSN2168-7161
卷号6期号:3页码:720-733
摘要The explosive growth of data brings new challenges to the data storage and management in cloud environment. These data usually have to be processed in a timely fashion in the cloud. Thus, any increased latency may cause a massive loss to the enterprises. Similarity detection plays a very important role in data management. Many typical algorithms such as Shingle, Simhash, Traits and Traditional Sampling Algorithm (TSA) are extensively used. The Shingle, Simhash and Traits algorithms read entire source file to calculate the corresponding similarity characteristic value, thus requiring lots of CPU cycles and memory space and incurring tremendous disk accesses. In addition, the overhead increases with the growth of data set volume and results in a long delay. Instead of reading entire file, TSA samples some data blocks to calculate the fingerprints as similarity characteristics value. The overhead of TSA is fixed and negligible. However, a slight modification of source files will trigger the bit positions of file content shifting. Therefore, a failure of similarity identification is inevitable due to the slight modifications. This paper proposes an Enhanced Position-Aware Sampling algorithm (EPAS) to identify file similarity for the cloud by modulo file length. EPAS concurrently samples data blocks from the head and the tail of the modulated file to avoid the position shift incurred by the modifications. Meanwhile, an improved metric is proposed to measure the similarity between different files and make the possible detection probability close to the actual probability. Furthermore, this paper describes a query algorithm to reduce the time overhead of similarity detection. Our experimental results demonstrate that the EPAS significantly outperforms the existing well known algorithms in terms of time overhead, CPU and memory occupation. Moreover, EPAS makes a more preferable tradeoff between precision and recall than that of other similarity detection algorithms. Therefore, it is an effective approach of similarity identification for the cloud.
关键词Similarity detection sampling shingle position-aware cloud
DOI10.1109/TCC.2016.2527646
收录类别SCI
语种英语
资助项目National Science foundation (NSF) of China[61572232] ; National Science foundation (NSF) of China[61272073] ; key program of NSF of Guangdong Province[S2013020012865] ; Fundamental Research Funds for the Central Universities ; Open Research Fund of Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences[CARCH201401] ; Science and Technology Planning Project of Guangdong Province[2013B090200021]
WOS研究方向Computer Science
WOS类目Computer Science, Information Systems ; Computer Science, Software Engineering ; Computer Science, Theory & Methods
WOS记录号WOS:000443894000010
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
引用统计
被引频次:2[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://119.78.100.204/handle/2XEOYT63/4954
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Zhou, Yongtao
作者单位1.Jinan Univ, Dept Comp Sci, Guangzhou 510632, Guangdong, Peoples R China
2.Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing 100080, Peoples R China
3.St Francis Xavier Univ, Dept Comp Sci, Antigonish, NS B2G 2W5, Canada
推荐引用方式
GB/T 7714
Zhou, Yongtao,Deng, Yuhui,Xie, Junjie,et al. EPAS: A Sampling Based Similarity Identification Algorithm for the Cloud[J]. IEEE TRANSACTIONS ON CLOUD COMPUTING,2018,6(3):720-733.
APA Zhou, Yongtao,Deng, Yuhui,Xie, Junjie,&Yang, Laurence T..(2018).EPAS: A Sampling Based Similarity Identification Algorithm for the Cloud.IEEE TRANSACTIONS ON CLOUD COMPUTING,6(3),720-733.
MLA Zhou, Yongtao,et al."EPAS: A Sampling Based Similarity Identification Algorithm for the Cloud".IEEE TRANSACTIONS ON CLOUD COMPUTING 6.3(2018):720-733.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Zhou, Yongtao]的文章
[Deng, Yuhui]的文章
[Xie, Junjie]的文章
百度学术
百度学术中相似的文章
[Zhou, Yongtao]的文章
[Deng, Yuhui]的文章
[Xie, Junjie]的文章
必应学术
必应学术中相似的文章
[Zhou, Yongtao]的文章
[Deng, Yuhui]的文章
[Xie, Junjie]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。