📝 Publications
📚 System Algorithm Co-design

ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models
Chao Zeng *, Songwei Liu*, Yusheng Xie*, Hong Liu, Xiaojian Wang,
Miao Wei, Shu Yang, Fangmin Chen, Xing Mei †
Project |
- 🚀 ABQ-LLM breaks quantization limits: Run LLMs at ANY bit-width you want, with REAL speedup!
- Academic Impact: It introduces hardware-aware dynamic quantization, enabling latency-optimal bit allocation across transformer layers without retraining.
- Industry Impact: It achieves a 1.6x inference speedup and 2.7x memory compression ratio compared to the industry’s state-of-the-art SmoothQuant framework, with its kernel performance significantly surpassing CUTLASS-accelerated operators.

GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference
Chao Zeng*, Songwei Liu*, Shu Yang, Fangmin Chen †, Xing Mei, Lean Fu
-
🚀 GQSA explores a group sparsity pattern beyond the conventional 2:4 sparsity, achieving a better trade-off between accuracy and speed through a combination of algorithmlevel optimizations and a customized software engine.
-
GQSA offers several advantages over the 2:4 sparsity technique, such as Flexible and Adjustable Sparsity Rate, Higher Weight Compression Rate, Enhanced Compatibility with Various Quantization Methods.
Arxiv 2023SparseByteNN: A Novel Mobile Inference Acceleration Framework Based on Fine-Grained Group Sparsity, Songwei Liu*, Haitao Xu, Yuyang Xu, Shuai Wang, Jiashi Li, Chenqian Yan, et al.
📚 Model Compression

Residual Local Feature Network for Efficient Super-Resolution
Fangyuan Kong*, Mingxi Li*, Songwei Liu*, Ding Liu, Jingwen He, Yang Bai, Fangmin Chen, Lean Fu
Project |
- 🚀 RLFN achieves state-of-the-art (SOTA) performance in lightweight super-resolution through innovative architectural design, advanced training strategies, and efficient model compression techniques!
- Academic Impact: It has emerged as a foundational baseline in the Efficient Super-Resolution (ESR) domain, driving advancements across the field.
- Industry Impact: It serves as the official baseline model for the NITRE 2023 competition and is applied to multiple product lines of ByteDance.

Hybrid SD: Edge-Cloud Collaborative Inference for Stable Diffusion Models
Chenqian Yan*, Songwei Liu*, Hongjian Liu*, Xurui Peng, Xiaojian Wang, Fangmin Chen, Lean Fu, Xing Mei.
Project | |
- 🚀 Hybrid-SD has launched the industry’s most performant lightweight models: VAE, SD1.5, and SDXL. Now available for deployment!
-
Arxiv 2024FoldGPT: Simple and Effective Large Language Model CompressionScheme, Songwei Liu* Chao Zeng*, Lianqiang Li, Chenqian Yan, Lean Fu, Xing Mei, Fangmin Chen †. -
ACM-MM 2023Unfolding once is enough: A deployment-friendly transformer unit for super-resolution, Y Liu, H Dong, B Liang, Songwei Liu, Q Dong, K Chen, F Chen, L Fu, F Wang. -
Arxiv 2021Mixsearch: Searching for domain generalized medical image segmentation architectures, L Liu, Z Wen, Songwei Liu, HY Zhou, H Zhu, W Xie, L Shen, K Ma, Y Zheng. -
ICCA 2019Binary convolutional neural network with high accuracy and compression rate, Songwei Liu, Hongwei Zhu.