Publications / 出版物

Journal Paper

T1. Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks

Conference Paper

C20. OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization

C19. Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training

C18. Ant: Exploiting adaptive numerical data type for low-bit deep neural network quantization

C17. SQuant: On-the-fly data-free quantization via diagonal hessian approximation

C16. Dual-side sparse tensor core

C15. Boosting mobile CNN inference through semantic memory

C14. Scylla: Qoe-aware continuous mobile vision with fpga-based dynamic deep neural network reconfiguration

C13. Ladabert: Lightweight adaptation of bert through hybrid model compression

C12. Live video analytics with FPGA-based smart cameras

C11. Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity

C10. Balanced sparsity for efficient dnn inference on gpu

C9. Seernet: Predicting convolutional neural network feature-map sparsity through low-bit quantization

C8. Best-effort FPGA programming: A few steps can go a long way

C7. Using data compression for optimizing FPGA-based convolutional neural network accelerators

C6. Energy-efficient CNN implementation on a deeply pipelined FPGA cluster

C5. Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks

C4. Optimizing FPGA-based accelerator design for deep convolutional neural networks

C3. An efficient design and implementation of LSM-tree based key-value store on open-channel SSD

C2. Memory partitioning for multidimensional arrays in high-level synthesis

C1. Automatic multidimensional memory partitioning for FPGA-based accelerators