As a High-Performance Computing (HPC) expert, I am part of the ByteNN team at ByteDance
in China, focusing on full-stack optimization of deep learning algorithms, including model optimization and GPU-based inference acceleration.
I graduated from School Of Optical and Electronic Information, Huazhong University of Science & Technology (华中科技大学光电学院) with a bachelor’s degree and from the ISEE, Zhejiang University (浙江大学信电学院) with a master’s degree. My research interest includes AutoML, High-Performance Computing (HPC) and System Algorithm co-design. I have published 7+ papers at the top international AI conferences such as AAAI, ACM-MM, CVPR.
I am now leading model optimization efforts at ByteNN team, driving collaborative optimizations between the inference engine and model architecture to reduce cloud inference costs for LLM and Diffusion models, while advancing the deployment of AIGC models on edge devices. My current research interests include:
- Lightweight and Efficient Backbone Model, including compact architectures like small-scale VAEs, SDXL,and LLMs.
- Quantization-based Inference Acceleration tailored for diverse hardware platforms and computational tasks.
- Sparsity-driven Inference Acceleration through systematic model compression and sparse computation optimization.
- Cache Reuse Optimization, Distributed Parallel Optimization, and Communication Optimization.
If you are seeking any form of academic cooperation, please feel free to email me at liusongwei.zju@bytedance.com. We are hiring interns!