CSpace  > 中国科学院计算技术研究所期刊论文  > 英文
Rich-text document styling restoration via reinforcement learning
Li, Hongwei1,2; Hu, Yingpeng1,2; Cao, Yixuan1,2; Zhou, Ganbin3; Luo, Ping1,2
2021-08-01
发表期刊FRONTIERS OF COMPUTER SCIENCE
ISSN2095-2228
卷号15期号:4页码:11
摘要Richly formatted documents, such as financial disclosures, scientific articles, government regulations, widely exist on Web. However, since most of these documents are only for public reading, the styling information inside them is usually missing, making them improper or even burdensome to be displayed and edited in different formats and platforms. In this study we formulate the task of document styling restoration as an optimization problem, which aims to identify the styling settings on the document elements, e.g., lines, table cells, text, so that rendering with the output styling settings results in a document, where each element inside it holds the (closely) exact position with the one in the original document. Considering that each styling setting is a decision, this problem can be transformed as a multi-step decision-making task over all the document elements, and then be solved by reinforcement learning. Specifically, Monte-Carlo Tree Search (MCTS) is leveraged to explore the different styling settings, and the policy function is learnt under the supervision of the delayed rewards. As a case study, we restore the styling information inside tables, where structural and functional data in the documents are usually presented. Experiment shows that, our best reinforcement method successfully restores the stylings in 87.65% of the tables, with 25.75% absolute improvement over the greedy method. We also discuss the tradeoff between the inference time and restoration success rate, and argue that although the reinforcement methods cannot be used in real-time scenarios, it is suitable for the offline tasks with high-quality requirement. Finally, this model has been applied in a PDF parser to support cross-format display.
关键词styling restoration monte-carlo tree search reinforcement learning richly formatted documents tables
DOI10.1007/s11704-020-9322-7
收录类别SCI
语种英语
资助项目National Key Research and Development Program of China[2017YFB1002104] ; National Natural Science Foundation of China[U1811461] ; Innovation Program of Institute of Computing Technology, CAS
WOS研究方向Computer Science
WOS类目Computer Science, Information Systems ; Computer Science, Software Engineering ; Computer Science, Theory & Methods
WOS记录号WOS:000729649100001
出版者HIGHER EDUCATION PRESS
引用统计
被引频次:3[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://119.78.100.204/handle/2XEOYT63/18085
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Luo, Ping
作者单位1.Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China
2.Univ Chinese Acad Sci, Beijing 100049, Peoples R China
3.Tencent, WeChat Search Applicat Dept, Search Prod Ctr, Beijing 100080, Peoples R China
推荐引用方式
GB/T 7714
Li, Hongwei,Hu, Yingpeng,Cao, Yixuan,et al. Rich-text document styling restoration via reinforcement learning[J]. FRONTIERS OF COMPUTER SCIENCE,2021,15(4):11.
APA Li, Hongwei,Hu, Yingpeng,Cao, Yixuan,Zhou, Ganbin,&Luo, Ping.(2021).Rich-text document styling restoration via reinforcement learning.FRONTIERS OF COMPUTER SCIENCE,15(4),11.
MLA Li, Hongwei,et al."Rich-text document styling restoration via reinforcement learning".FRONTIERS OF COMPUTER SCIENCE 15.4(2021):11.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Li, Hongwei]的文章
[Hu, Yingpeng]的文章
[Cao, Yixuan]的文章
百度学术
百度学术中相似的文章
[Li, Hongwei]的文章
[Hu, Yingpeng]的文章
[Cao, Yixuan]的文章
必应学术
必应学术中相似的文章
[Li, Hongwei]的文章
[Hu, Yingpeng]的文章
[Cao, Yixuan]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。