📝 Publications

📚 System Algorithm Co-design

AAAI 2025

ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models
Chao Zeng *, Songwei Liu*, Yusheng Xie*, Hong Liu, Xiaojian Wang, Miao Wei, Shu Yang, Fangmin Chen, Xing Mei †

Project |

🚀 ABQ-LLM breaks quantization limits: Run LLMs at ANY bit-width you want, with REAL speedup!
Academic Impact: It introduces hardware-aware dynamic quantization, enabling latency-optimal bit allocation across transformer layers without retraining.
Industry Impact: It achieves a 1.6x inference speedup and 2.7x memory compression ratio compared to the industry’s state-of-the-art SmoothQuant framework, with its kernel performance significantly surpassing CUTLASS-accelerated operators.

ACL XX

🚀 GQSA explores a group sparsity pattern beyond the conventional 2:4 sparsity, achieving a better trade-off between accuracy and speed through a combination of algorithmlevel optimizations and a customized software engine.
GQSA offers several advantages over the 2:4 sparsity technique, such as Flexible and Adjustable Sparsity Rate, Higher Weight Compression Rate, Enhanced Compatibility with Various Quantization Methods.

CVPRW 2022

Residual Local Feature Network for Efficient Super-Resolution
Fangyuan Kong*, Mingxi Li*, Songwei Liu*, Ding Liu, Jingwen He, Yang Bai, Fangmin Chen, Lean Fu

Project |

🚀 RLFN achieves state-of-the-art (SOTA) performance in lightweight super-resolution through innovative architectural design, advanced training strategies, and efficient model compression techniques!
Academic Impact: It has emerged as a foundational baseline in the Efficient Super-Resolution (ESR) domain, driving advancements across the field.
Industry Impact: It serves as the official baseline model for the NITRE 2023 competition and is applied to multiple product lines of ByteDance.

Arxiv 2024

Hybrid SD: Edge-Cloud Collaborative Inference for Stable Diffusion Models
Chenqian Yan*, Songwei Liu*, Hongjian Liu*, Xurui Peng, Xiaojian Wang, Fangmin Chen, Lean Fu, Xing Mei.

Project | |

🚀 Hybrid-SD has launched the industry’s most performant lightweight models: VAE, SD1.5, and SDXL. Now available for deployment!

Arxiv 2024 FoldGPT: Simple and Effective Large Language Model CompressionScheme, Songwei Liu* Chao Zeng*, Lianqiang Li, Chenqian Yan, Lean Fu, Xing Mei, Fangmin Chen †.
ACM-MM 2023 Unfolding once is enough: A deployment-friendly transformer unit for super-resolution, Y Liu, H Dong, B Liang, Songwei Liu, Q Dong, K Chen, F Chen, L Fu, F Wang.
Arxiv 2021 Mixsearch: Searching for domain generalized medical image segmentation architectures, L Liu, Z Wen, Songwei Liu, HY Zhou, H Zhu, W Xie, L Shen, K Ma, Y Zheng.
ICCA 2019 Binary convolutional neural network with high accuracy and compression rate, Songwei Liu, Hongwei Zhu.