🎖 Projects and Skills
- Programming Language: Python, C++, CUDA C/PTX
- Technical Expertise: Expert in low-level kernel optimization
- Deep optimization of core operators using MMA/WMMA instructions and PTX assembly-level tuning
- Quantized GEMM implementation for both compute-intensive and memory-bound scenarios
- Proficient with CUDA acceleration libraries: CUTLASS, TensorRT, FastTransformer, and Triton
- Key Project Experience:
- Responsible for Doubao multi-modal model inference optimization;
- ByteNN-LLM inference engine architecture design and CUDA backend implementation;
- Core developer of the Lighten inference engine.