Publications / 出版物

Journal Paper

T1. Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks

Chen Zhang, Guangyu Sun, Zhenman Fang, Peipei Zhou, Peichen Pan, Jason Cong
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (T-CAD 2018) 【PDF】
Citation: 600
Award: 2017~2019 Donald O. Pederson Best Paper Award
Keyword: Convolutional Neural Network, FPGA, Design Automation, Caffe, SDAccel

Conference Paper

C20. OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization

Cong Guo, Jiaming Tang, Weiming Hu, Jingwen Leng, Chen Zhang, Fan Yang, Yunxin Liu, Minyi Guo, Yuhao Zhu
Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA 2023)【PDF】

C19. Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training

Cong Guo, Yuxian Qiu, Jingwen Leng, Chen Zhang, Ying Cao, Quanlu Zhang, Yunxin Liu, Fan Yang, Minyi Guo
2022 IEEE 40th International Conference on Computer Design (ICCD 2022)【PDF】

C18. Ant: Exploiting adaptive numerical data type for low-bit deep neural network quantization

Cong Guo, Chen Zhang, Jingwen Leng, Zihan Liu, Fan Yang, Yunxin Liu, Minyi Guo, Yuhao Zhu
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture (MICRO 2022)【PDF】
Award: MICRO 2023 Top Picks Honorable Mention
Keywords: AI acceleration, Tensor Core, Quantization

C17. SQuant: On-the-fly data-free quantization via diagonal hessian approximation

Cong Guo, Yuxian Qiu, Jingwen Leng, Xiaotian Gao, Chen Zhang, Yunxin Liu, Fan Yang, Yuhao Zhu, Minyi Guo
International Conference on Learning (ICLR 2022) 【PDF】

C16. Dual-side sparse tensor core

Yang Wang, Chen Zhang, Zhiqiang Xie, Cong Guo, Yunxin Liu, Jingwen Leng
2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA 2021)【PDF】
Keywords: GPGPU, Sparse Tensor Core, AI acceleration

C15. Boosting mobile CNN inference through semantic memory

Yun Li, Chen Zhang, Shihao Han, Li Lyna Zhang, Baoqun Yin, Yunxin Liu, Mengwei Xu
Proceedings of the 29th ACM International Conference on Multimedia (Multimedia 2021)【PDF】【Web】

C14. Scylla: Qoe-aware continuous mobile vision with fpga-based dynamic deep neural network reconfiguration

Shuang Jiang, Zhiyao Ma, Xiao Zeng, Chenren Xu, Mi Zhang, Chen Zhang, Yunxin Liu
Proceedings of the 2022 IEEE Conference on Computer Communications (INFOCOM 2020)【PDF】

C13. Ladabert: Lightweight adaptation of bert through hybrid model compression

Yihuan Mao, Yujing Wang, Chufan Wu, Chen Zhang, Yang Wang, Yaming Yang, Quanlu Zhang, Yunhai Tong, Jing Bai
Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020)【PDF】

C12. Live video analytics with FPGA-based smart cameras

Shang Wang, Chen Zhang, Yuanchao Shu, Yunxin Liu
Proceedings of the 2019 Workshop on Hot Topics in Video Analytics and Intelligent Edges (HotEdges 2019)【PDF】

C11. Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity

Shijie Cao, Chen Zhang, Zhuliang Yao, Wencong Xiao, Lanshun Nie, Dechen Zhan, Yunxin Liu, Ming Wu, Lintao Zhang
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2019) 【PDF】
Industrial Impact: Used by NVIDIA Sparse Tensor Core (Ampere and Hopper Architecture)
Keyword: Sparse Neural Network, Acceleration, FPGA

C10. Balanced sparsity for efficient dnn inference on gpu

Zhuliang Yao, Shijie Cao, Wencong Xiao, Chen Zhang, Lanshun Nie
Proceedings of the AAAI conference on artificial intelligence (AAAI 2019) 【PDF】
Industrial Impact: Used by NVIDIA Sparse Tensor Core (Ampere and Hopper Architecture)
Keyword: Sparse Neural Network, Acceleration, GPGPU

C9. Seernet: Predicting convolutional neural network feature-map sparsity through low-bit quantization

Shijie Cao, Lingxiao Ma, Wencong Xiao, Chen Zhang, Yunxin Liu, Lintao Zhang, Lanshun Nie, Zhi Yang
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019) 【PDF】

C8. Best-effort FPGA programming: A few steps can go a long way

Jason Cong, Zhenman Fang, Yuchen Hao, Peng Wei, Cody Hao Yu, Chen Zhang, Peipei Zhou
arXiv preprint arXiv:1807.01340 (2018)

C7. Using data compression for optimizing FPGA-based convolutional neural network accelerators

Yijin Guan, Ningyi Xu, Chen Zhang, Zhihang Yuan, Jason Cong
International workshop on advanced parallel processing technologies (2017)

C6. Energy-efficient CNN implementation on a deeply pipelined FPGA cluster

Chen Zhang, Di Wu, Jiayu Sun, Guangyu Sun, Guojie Luo, Jason Con
Proceedings of the 2016 International Symposium on Low Power Electronics and Design (ISLPED 2016) 【PDF】

C5. Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks

Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, Jason Cong
Proceedings of the 35th International Conference on Computer-Aided Design (ICCAD 2016) 【PDF】

C4. Optimizing FPGA-based accelerator design for deep convolutional neural networks

Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, Jason Cong, “
Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays (FPGA 2015)【PDF】
Citation: 2218 (As of 2023, Top-1 citation in 29 year FPGA conference history)
Award: FPGA-2015 Best Paper Nomination
Keyword: Convolutional Neural Network, FPGA, Acceleration, Roofline Model

C3. An efficient design and implementation of LSM-tree based key-value store on open-channel SSD

Peng Wang, Guangyu Sun, Song Jiang, Jian Ouyang, Shiding Lin, Chen Zhang, Jason Cong
Proceedings of the Ninth European Conference on Computer Systems (EuroSys 2014)【PDF】

C2. Memory partitioning for multidimensional arrays in high-level synthesis

Yuxin Wang, Peng Li, Peng Zhang, Chen Zhang, Jason Cong
Proceedings of the 50th Annual Design Automation Conference (DAC 2013)【PDF】

C1. Automatic multidimensional memory partitioning for FPGA-based accelerators

Yuxin Wang, Peng Li, Peng Zhang, Chen Zhang, Jason Cong
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays (FPGA 2013)