Institute of Computing Technology, Chinese Academy IR
A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production | |
Cui, Zhenchao1; Chen, Ziang1; Li, Zhaoxin2; Wang, Zhaoqi2 | |
2022-12-01 | |
发表期刊 | SENSORS |
卷号 | 22期号:24页码:15 |
摘要 | As a typical sequence to sequence task, sign language production (SLP) aims to automatically translate spoken language sentences into the corresponding sign language sequences. The existing SLP methods can be classified into two categories: autoregressive and non-autoregressive SLP. The autoregressive methods suffer from high latency and error accumulation caused by the long-term dependence between current output and the previous poses. And non-autoregressive methods suffer from repetition and omission during the parallel decoding process. To remedy these issues in SLP, we propose a novel method named Pyramid Semi-Autoregressive Transformer with Rich Semantics (PSAT-RS) in this paper. In PSAT-RS, we first introduce a pyramid Semi-Autoregressive mechanism with dividing target sequence into groups in a coarse-to-fine manner, which globally keeps the autoregressive property while locally generating target frames. Meanwhile, the relaxed masked attention mechanism is adopted to make the decoder not only capture the pose sequences in the previous groups, but also pay attention to the current group. Finally, considering the importance of spatial-temporal information, we also design a Rich Semantics embedding (RS) module to encode the sequential information both on time dimension and spatial displacement into the same high-dimensional space. This significantly improves the coordination of joints motion, making the generated sign language videos more natural. Results of our experiments conducted on RWTH-PHOENIX-Weather-2014T and CSL datasets show that the proposed PSAT-RS is competitive to the state-of-the-art autoregressive and non-autoregressive SLP models, achieving a better trade-off between speed and accuracy. |
关键词 | human pose generation sign language production semi-autoregressive transformer deep learning |
DOI | 10.3390/s22249606 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Key Research and Development Program of China ; Post-graduate's Innovation Fund Project of Hebei University ; National Natural Science Foundation of China ; Scientific Research Foundation for Talented Scholars of Hebei University ; Scientific Research Foundation of Colleges and Universities in Hebei Province ; [2020YFC1523302] ; [HBU2022ss014] ; [62172392] ; [521100221081] ; [QN2022107] |
WOS研究方向 | Chemistry ; Engineering ; Instruments & Instrumentation |
WOS类目 | Chemistry, Analytical ; Engineering, Electrical & Electronic ; Instruments & Instrumentation |
WOS记录号 | WOS:000902932900001 |
出版者 | MDPI |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/20180 |
专题 | 中国科学院计算技术研究所期刊论文 |
通讯作者 | Li, Zhaoxin |
作者单位 | 1.Hebei Univ, Hebei Machine Vis Engn Res Ctr, Sch Cyber Secur & Comp, Baoding 071002, Peoples R China 2.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China |
推荐引用方式 GB/T 7714 | Cui, Zhenchao,Chen, Ziang,Li, Zhaoxin,et al. A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production[J]. SENSORS,2022,22(24):15. |
APA | Cui, Zhenchao,Chen, Ziang,Li, Zhaoxin,&Wang, Zhaoqi.(2022).A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production.SENSORS,22(24),15. |
MLA | Cui, Zhenchao,et al."A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production".SENSORS 22.24(2022):15. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论