Institute of Computing Technology, Chinese Academy IR
Multi-scaling sampling: An adaptive sampling method for discovering approximate association rules | |
Jia, CY; Gao, XP | |
2005-05-01 | |
发表期刊 | JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY |
ISSN | 1000-9000 |
卷号 | 20期号:3页码:309-318 |
摘要 | One of the obstacles of the efficient association rule mining is the explosive expansion of data sets since it is costly or impossible to scan large databases, esp., for multiple times. A popular solution to improve the speed and scalability of the association rule mining is to do the algorithm on a random sample instead of the entire database. But how to effectively define and efficiently estimate the degree of error with respect to the outcome of the algorithm, and how to determine the sample size needed are entangling researches until now. In this paper, an effective and efficient algorithm is given based on the PAC (Probably Approximate Correct) learning theory to measure and estimate sample error. Then, a new adaptive, on-line, fast sampling strategy - multi-scaling sampling - is presented inspired by MRA (Multi-Resolution Analysis) and Shannon sampling theorem, for quickly obtaining acceptably approximate association rules at appropriate sample size. Both theoretical analysis and empirical study have showed that the sampling strategy can achieve a very good speed-accuracy trade-off. |
关键词 | data mining association rule frequent itemset sample error multi-scaling sampling |
收录类别 | SCI |
语种 | 英语 |
WOS研究方向 | Computer Science |
WOS类目 | Computer Science, Hardware & Architecture ; Computer Science, Software Engineering |
WOS记录号 | WOS:000229292300003 |
出版者 | SCIENCE PRESS |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/9997 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Jia, CY |
作者单位 | 1.Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100080, Peoples R China 2.Chinese Acad Sci, Grad Sch, Beijing 100039, Peoples R China 3.Xiangtan Univ, Informat Engn Coll, Xiangtan 411105, Peoples R China |
推荐引用方式 GB/T 7714 | Jia, CY,Gao, XP. Multi-scaling sampling: An adaptive sampling method for discovering approximate association rules[J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY,2005,20(3):309-318. |
APA | Jia, CY,&Gao, XP.(2005).Multi-scaling sampling: An adaptive sampling method for discovering approximate association rules.JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY,20(3),309-318. |
MLA | Jia, CY,et al."Multi-scaling sampling: An adaptive sampling method for discovering approximate association rules".JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 20.3(2005):309-318. |
条目包含的文件 | 条目无相关文件。 |
个性服务 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
谷歌学术 |
谷歌学术中相似的文章 |
[Jia, CY]的文章 |
[Gao, XP]的文章 |
百度学术 |
百度学术中相似的文章 |
[Jia, CY]的文章 |
[Gao, XP]的文章 |
必应学术 |
必应学术中相似的文章 |
[Jia, CY]的文章 |
[Gao, XP]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论