📝 Publications

📚 System Algorithm Co-design

AAAI 2025
sym

ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models
Chao Zeng *, Songwei Liu*, Yusheng Xie*, Hong Liu, Xiaojian Wang, Miao Wei, Shu Yang, Fangmin Chen, Xing Mei †

Project |

  • 🚀 ABQ-LLM breaks quantization limits: Run LLMs at ANY bit-width you want, with REAL speedup!
  • Academic Impact: It introduces hardware-aware dynamic quantization, enabling latency-optimal bit allocation across transformer layers without retraining.
  • Industry Impact: It achieves a 1.6x inference speedup and 2.7x memory compression ratio compared to the industry’s state-of-the-art SmoothQuant framework, with its kernel performance significantly surpassing CUTLASS-accelerated operators.
ACL XX
sym

GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference
Chao Zeng*, Songwei Liu*, Shu Yang, Fangmin Chen †, Xing Mei, Lean Fu

  • 🚀 GQSA explores a group sparsity pattern beyond the conventional 2:4 sparsity, achieving a better trade-off between accuracy and speed through a combination of algorithmlevel optimizations and a customized software engine.

  • GQSA offers several advantages over the 2:4 sparsity technique, such as Flexible and Adjustable Sparsity Rate, Higher Weight Compression Rate, Enhanced Compatibility with Various Quantization Methods.

📚 Model Compression

CVPRW 2022
sym

Residual Local Feature Network for Efficient Super-Resolution
Fangyuan Kong*, Mingxi Li*, Songwei Liu*, Ding Liu, Jingwen He, Yang Bai, Fangmin Chen, Lean Fu

Project |

  • 🚀 RLFN achieves state-of-the-art (SOTA) performance in lightweight super-resolution through innovative architectural design, advanced training strategies, and efficient model compression techniques!
  • Academic Impact: It has emerged as a foundational baseline in the Efficient Super-Resolution (ESR) domain, driving advancements across the field.
  • Industry Impact: It serves as the official baseline model for the NITRE 2023 competition and is applied to multiple product lines of ByteDance.
Arxiv 2024
sym

Hybrid SD: Edge-Cloud Collaborative Inference for Stable Diffusion Models
Chenqian Yan*, Songwei Liu*, Hongjian Liu*, Xurui Peng, Xiaojian Wang, Fangmin Chen, Lean Fu, Xing Mei.

Project | | Hugging Face

  • 🚀 Hybrid-SD has launched the industry’s most performant lightweight models: VAE, SD1.5, and SDXL. Now available for deployment!