CSpace  > 中国科学院计算技术研究所期刊论文  > 英文
Innovating web page classification through reducing noise
Li, XL; Shi, ZZ
2002
发表期刊JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY
ISSN1000-9000
卷号17期号:1页码:9-17
摘要This paper presents a new method that eliminates noise in Web page classification. It first describes the presentation of a Web page based on HTML tags. Then through a novel distance formula, it eliminates the noise in similarity measure. After carefully analyzing Web pages, we design an algorithm that can distinguish related hyperlinks from noisy ones. We can utilize non-noisy hyperlinks to improve the performance of Web page classification (the CAWN algorithm). For any page, we can classify it through the text and category of neighbor pages related to the page. The experimental results show that our approach improved classification accuracy.
关键词web page classification similarity measure classification algorithm without noise
收录类别SCI
语种英语
WOS研究方向Computer Science
WOS类目Computer Science, Hardware & Architecture ; Computer Science, Software Engineering
WOS记录号WOS:000173631200002
出版者SCIENCE CHINA PRESS
引用统计
被引频次:5[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://119.78.100.204/handle/2XEOYT63/13542
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Li, XL
作者单位1.Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100080, Peoples R China
2.Natl Univ Singapore, Sch Comp, Singapore 117543, Singapore
推荐引用方式
GB/T 7714
Li, XL,Shi, ZZ. Innovating web page classification through reducing noise[J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY,2002,17(1):9-17.
APA Li, XL,&Shi, ZZ.(2002).Innovating web page classification through reducing noise.JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY,17(1),9-17.
MLA Li, XL,et al."Innovating web page classification through reducing noise".JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 17.1(2002):9-17.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Li, XL]的文章
[Shi, ZZ]的文章
百度学术
百度学术中相似的文章
[Li, XL]的文章
[Shi, ZZ]的文章
必应学术
必应学术中相似的文章
[Li, XL]的文章
[Shi, ZZ]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。