Institute of Computing Technology, Chinese Academy IR
Parallel Incremental Frequent Itemset Mining for Large Data | |
Song, Yu-Geng1,2; Cui, Hui-Min1,2; Feng, Xiao-Bing1 | |
2017-03-01 | |
发表期刊 | JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY |
ISSN | 1000-9000 |
卷号 | 32期号:2页码:368-385 |
摘要 | Frequent itemset mining (FIM) is a popular data mining issue adopted in many fields, such as commodity recommendation in the retail industry, log analysis in web searching, and query recommendation (or related search). A large number of FIM algorithms have been proposed to obtain better performance, including parallelized algorithms for processing large data volumes. Besides, incremental FIM algorithms are also proposed to deal with incremental database updates. However, most of these incremental algorithms have low parallelism, causing low efficiency on huge databases. This paper presents two parallel incremental FIM algorithms called IncMiningPFP and IncBuildingPFP, implemented on the MapReduce framework. IncMiningPFP preserves the FP-tree mining results of the original pass, and utilizes them for incremental calculations. In particular, we propose a method to generate a partial FP-tree in the incremental pass, in order to avoid unnecessary mining work. Further, some of the incremental parallel tasks can be omitted when the inserted transactions include fewer items. IncbuildingPFP preserves the CanTrees built in the original pass, and then adds new transactions to them during the incremental passes. Our experimental results show that IncMiningPFP can achieve significant speedup over PFP (Parallel FPGrowth) and a sequential incremental algorithm (CanTree) in most cases of incremental input database, and in other cases IncBuildingPFP can achieve it. |
关键词 | incremental parallel FPGrowth data mining frequent itemset mining MapReduce |
DOI | 10.1007/s11390-017-1726-y |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National High Technology Research and Development 863 Program of China[2015AA011505] ; National High Technology Research and Development 863 Program of China[2015AA015306] ; National High Technology Research and Development 863 Program of China[2012AA010902] ; National Natural Science Foundation of China[61202055] ; National Natural Science Foundation of China[61221062] ; National Natural Science Foundation of China[61521092] ; National Natural Science Foundation of China[61303053] ; National Natural Science Foundation of China[61432016] ; National Natural Science Foundation of China[61402445] ; National Natural Science Foundation of China[61672492] ; National Key Research and Development Program of China[2016YFB1000402] |
WOS研究方向 | Computer Science |
WOS类目 | Computer Science, Hardware & Architecture ; Computer Science, Software Engineering |
WOS记录号 | WOS:000397835500014 |
出版者 | SCIENCE PRESS |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/7412 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Song, Yu-Geng |
作者单位 | 1.Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing 100190, Peoples R China 2.Univ Chinese Acad Sci, Beijing 100049, Peoples R China |
推荐引用方式 GB/T 7714 | Song, Yu-Geng,Cui, Hui-Min,Feng, Xiao-Bing. Parallel Incremental Frequent Itemset Mining for Large Data[J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY,2017,32(2):368-385. |
APA | Song, Yu-Geng,Cui, Hui-Min,&Feng, Xiao-Bing.(2017).Parallel Incremental Frequent Itemset Mining for Large Data.JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY,32(2),368-385. |
MLA | Song, Yu-Geng,et al."Parallel Incremental Frequent Itemset Mining for Large Data".JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 32.2(2017):368-385. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论