CSpace

浏览/检索结果: 共77条,第1-10条 帮助

已选(0)清除 条数/页:   排序方式:
OptiFX: Automatic Optimization for Convolutional Neural Networks with Aggressive Operator Fusion on GPUs 期刊论文
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2025, 卷号: 22, 期号: 2, 页码: 27
作者:  Wang, Xueying;  Li, Shigang;  Qian, Hao;  Luo, Fan;  Hao, Zhaoyang;  Wu, Tong;  Xu, Ruiyuan;  Cui, Huimin;  Feng, Xiaobing;  Li, Guangli
收藏  |  浏览/下载:4/0  |  提交时间:2025/12/03
Deep learning systems  convolutional neural networks  operator fusion  
SRSparse: Generating Codes for High-Performance Sparse Matrix-Vector Semiring Computations 期刊论文
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2025, 卷号: 22, 期号: 2, 页码: 26
作者:  Du, Zhen;  Li, Ying;  Sun, Ninghui;  Cui, Huimin;  Feng, Xiaobing;  Li, Jiajia
收藏  |  浏览/下载:4/0  |  提交时间:2025/12/03
High performance computing  sparse matrix computation  auto-tuning  code generator  semiring computation  
Fast Convolution Meets Low Precision: Exploring Efficient Quantized Winograd Convolution on Modern CPUs 期刊论文
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2024, 卷号: 21, 期号: 1, 页码: 26
作者:  Wang, Xueying;  Li, Guangli;  Jia, Zhen;  Feng, Xiaobing;  Wang, Yida
收藏  |  浏览/下载:61/0  |  提交时间:2024/05/20
Deep learning  winograd convolution  low-precision computation  
OpenCL-accelerated first-principles calculations of all-electron quantum perturbations on HPC resources (vol 11, 1156891, 2023) 期刊论文
FRONTIERS IN CHEMISTRY, 2023, 卷号: 11, 页码: 1
作者:  Wu, Zhikun;  Shang, Honghui;  Wu, Yangjun;  Zhang, Zhongcheng;  Liu, Ying;  Zhang, Yuyang;  Ouyang, Yucheng;  Cui, Huimin;  Feng, Xiaobing
收藏  |  浏览/下载:47/0  |  提交时间:2023/12/04
OpenCL  DFPT  GPU  optimization  heterogeneous  
OpenCL-accelerated first-principles calculations of all-electron quantum perturbations on HPC resources 期刊论文
FRONTIERS IN CHEMISTRY, 2023, 卷号: 11, 页码: 15
作者:  Wu, Zhikun;  Shang, Honghui;  Wu, Yangjun;  Zhang, Zhongcheng;  Liu, Ying;  Zhang, Yuyang;  Ouyang, Yucheng;  Cui, Huimin;  Feng, Xiaobing
收藏  |  浏览/下载:43/0  |  提交时间:2023/12/04
OpenCL  DFPT  GPU  optimization  heterogeneous  
An Application-oblivious Memory Scheduling System for DNN Accelerators 期刊论文
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2022, 卷号: 19, 期号: 4, 页码: 26
作者:  Li, Jiansong;  Wang, Xueying;  Chen, Xiaobing;  Li, Guangli;  Dong, Xiao;  Zhao, Peng;  Yu, Xianzhi;  Yang, Yongxin;  Cao, Wei;  Liu, Lei;  Feng, Xiaobing
收藏  |  浏览/下载:75/0  |  提交时间:2023/07/12
Deep learning  memory scheduling  runtime system  DNN accelerators  
Scaling Poisson Solvers on Many Cores via MMEwald 期刊论文
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 卷号: 33, 期号: 8, 页码: 1888-1901
作者:  Wu, Mingchuan;  Wu, Yangjun;  Shang, Honghui;  Liu, Ying;  Cui, Huimin;  Li, Fang;  Duan, Xiaohui;  Zhang, Yunquan;  Feng, Xiaobing
收藏  |  浏览/下载:73/0  |  提交时间:2022/06/21
Optimization  Bandwidth  Supercomputers  Electric potential  Boundary conditions  Electrostatics  Silicon  Poisson solver  architecture-specific optimizations  many-core processor  
Optimizing deep neural networks on intelligent edge accelerators via flexible-rate filter pruning 期刊论文
JOURNAL OF SYSTEMS ARCHITECTURE, 2022, 卷号: 124, 页码: 11
作者:  Li, Guangli;  Ma, Xiu;  Wang, Xueying;  Yue, Hengshan;  Li, Jiansong;  Liu, Lei;  Feng, Xiaobing;  Xue, Jingling
收藏  |  浏览/下载:62/0  |  提交时间:2022/12/07
Edge intelligence  Deep learning  Neural network compression  
Compiler-assisted Operator Template Library for DNN Accelerators 期刊论文
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2021, 页码: 18
作者:  Li, Jiansong;  Cao, Wei;  Dong, Xiao;  Li, Guangli;  Wang, Xueying;  Zhao, Peng;  Liu, Lei;  Feng, Xiaobing
收藏  |  浏览/下载:98/0  |  提交时间:2021/12/01
DNN Accelerators  Template Library  Address Space Management  
Fusion-Catalyzed Pruning for Optimizing Deep Learning on Intelligent Edge Devices 期刊论文
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 卷号: 39, 期号: 11, 页码: 3614-3626
作者:  Li, Guangli;  Ma, Xiu;  Wang, Xueying;  Liu, Lei;  Xue, Jingling;  Feng, Xiaobing
收藏  |  浏览/下载:114/0  |  提交时间:2021/12/01
Deep learning system  edge intelligence  model compression and acceleration  neural networks