🎖 Projects and Skills

  • Programming Language: Python, C++, CUDA C/PTX
  • Technical Expertise: Expert in low-level kernel optimization
    • Deep optimization of core operators using MMA/WMMA instructions and PTX assembly-level tuning
    • Quantized GEMM implementation for both compute-intensive and memory-bound scenarios
    • Proficient with CUDA acceleration libraries: CUTLASS, TensorRT, FastTransformer, and Triton
  • Key Project Experience:
    • Responsible for Doubao multi-modal model inference optimization;
    • ByteNN-LLM inference engine architecture design and CUDA backend implementation;
    • Core developer of the Lighten inference engine.