Zhiyuan Li
AI Infra Engineer at Moonshot AI.
Focusing on efficient implementation and optimization of Linear Attention. Honored to have contributed to the development of Kimi Linear and Kimi Delta Attention (KDA), learning a lot from the excellent colleagues on the team.
π¬ Research Interests
- Linear Attention: Exploring sub-quadratic sequence modeling methods for more efficient long sequences
- Efficient Inference Optimization: CUDA kernel optimization, memory bandwidth optimization, Tensor Core acceleration
- Model Architectures: RWKV-6/7, Gated DeltaNet, and other novel attention mechanisms
π Open Source Contributions
- Contributed to flash-linear-attention community project - Efficient implementations of state-of-the-art linear attention models
π Articles
- Learning KDA from Scratch - Part 1 - Understanding KDA parallelization from an Infra perspective (Chinese)
π¬ About This Site
This site documents my learning notes, technical articles, and some immature thoughts in the AI Infra field. I’m still learning, so please feel free to point out any mistakes. Looking forward to exchanging ideas with you.
Contact:
- GitHub: @zhiyuan1i
- Zhihu: @lizhiyuan
- Email: lizhiyuan@moonshot.cn