Institute of Computing Technology, Chinese Academy IR
Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection | |
Lin, Liwei1,2; Wang, Xiangdong1; Liu, Hong1; Qian, Yueliang1 | |
2020 | |
发表期刊 | IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING |
ISSN | 2329-9290 |
卷号 | 28页码:1466-1478 |
摘要 | In this article, a special decision surface for the weakly-supervised sound event detection (SED) and a disentangled feature (DF) for the multi-label problem in polyphonic SED are proposed. We approach SED as a multiple instance learning (MIL) problem and utilize a neural network framework with a pooling module to solve it. General MIL approaches include two kinds: the instance-level approaches and embedding-level approaches. We present a method of generating instance-level probabilities for the embedding level approaches which tend to perform better than the instance-level approaches in terms of bag-level classification but can not provide instance-level probabilities in current approaches. Moreover, we further propose a specialized decision surface (SDS) for the embedding-level attention pooling. We analyze and explained why an embedding-level attention module with SDS is better than other typical pooling modules from the perspective of the high-level feature space. As for the problem of the unbalanced dataset and the co-occurrence of multiple categories in the polyphonic event detection task, we propose a DF to reduce interference among categories, which optimizes the high-level feature space by disentangling it based on class-wise identifiable information and obtaining multiple different subspaces. Experiments on the dataset of DCASE 2018 Task 4 show that the proposed SDS and DF significantly improve the detection performance of the embedding-level MIL approach with an attention pooling module and outperform the first place system in the challenge by $\mathbf {6.6}$ percentage points. |
关键词 | Sound event detection (SED) machine learning weakly-supervised learning attention pooling |
DOI | 10.1109/TASLP.2020.2989575 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | Beijing Natural Science Foundation[4172058] |
WOS研究方向 | Acoustics ; Engineering |
WOS类目 | Acoustics ; Engineering, Electrical & Electronic |
WOS记录号 | WOS:000538078300003 |
出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/15263 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Wang, Xiangdong |
作者单位 | 1.Chinese Acad Sci, Bejing Key Lab Mobile Comp & Pervas Device, Inst Comp Technol, Beijing 100190, Peoples R China 2.Univ Chinese Acad Sci, Beijing 100190, Peoples R China |
推荐引用方式 GB/T 7714 | Lin, Liwei,Wang, Xiangdong,Liu, Hong,et al. Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING,2020,28:1466-1478. |
APA | Lin, Liwei,Wang, Xiangdong,Liu, Hong,&Qian, Yueliang.(2020).Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection.IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING,28,1466-1478. |
MLA | Lin, Liwei,et al."Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection".IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 28(2020):1466-1478. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论