CSpace

浏览/检索结果: 共3条,第1-3条 帮助

限定条件        
已选(0)清除 条数/页:   排序方式:
Focus and Align: Learning Tube Tokens for Video-Language Pre-Training 期刊论文
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 卷号: 25, 页码: 8036-8050
作者:  Zhu, Yongqing;  Li, Xiangyang;  Zheng, Mao;  Yang, Jiahao;  Wang, Zihan;  Guo, Xiaoqian;  Chai, Zifeng;  Yuan, Yuchen;  Jiang, Shuqiang
收藏  |  浏览/下载:2/0  |  提交时间:2024/05/20
Electron tubes  Semantics  Visualization  Feature extraction  Task analysis  Transformers  Detectors  Local alignment mechanism  semantic centers  tube tokens  video-language pre-training  
Know More Say Less: Image Captioning Based on Scene Graphs 期刊论文
IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 卷号: 21, 期号: 8, 页码: 2117-2130
作者:  Li, Xiangyang;  Jiang, Shuqiang
收藏  |  浏览/下载:76/0  |  提交时间:2019/12/10
Image captioning  scene graph  relationship  long short-term network  attention mechanism  vision-language  
Bundled Object Context for Referring Expressions 期刊论文
IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 卷号: 20, 期号: 10, 页码: 2749-2760
作者:  Li, Xiangyang;  Jiang, Shuqiang
收藏  |  浏览/下载:53/0  |  提交时间:2019/12/10
Bundled object context  referring expression  LSTM  vision-language