Zhiyuan Li

The Mathematics of DPLR (Diagonal Plus Low Rank): Parallel Computing with Explicit Transition Matrices

Sat, 21 Feb 2026 10:44:23 +0000

A deep dive into the chunk-wise parallel algorithm for DPLR, understanding the WY representation of explicit diagonal-plus-low-rank transition matrices, and exploring the unified framework with KDA/IPLR

KDA (Kimi Delta Attention): From Matrix Multiplication to Affine Transformation

Tue, 17 Feb 2026 03:00:00 +0000

A deep dive into the chunk-wise parallel algorithm of KDA, establishing the theoretical framework of Affine transformations from basic matrix multiplication lemmas

About Me

Mon, 16 Feb 2026 00:00:00 +0000

Zhiyuan Li

AI Infra Engineer at Moonshot AI.

Focusing on efficient implementation and optimization of Linear Attention. Honored to have contributed to the development of Kimi Linear and Kimi Delta Attention (KDA), learning a lot from the excellent colleagues on the team.

🔬 Research Interests

Linear Attention: Exploring sub-quadratic sequence modeling methods for more efficient long sequences
Efficient Inference Optimization: CUDA kernel optimization, memory bandwidth optimization, Tensor Core acceleration
Model Architectures: RWKV-6/7, Gated DeltaNet, and other novel attention mechanisms

🚀 Open Source Contributions

Contributed to flash-linear-attention community project - Efficient implementations of state-of-the-art linear attention models

📝 Articles

Learning KDA from Scratch - Part 1 - Understanding KDA parallelization from an Infra perspective (Chinese)

💬 About This Site

This site documents my learning notes, technical articles, and some immature thoughts in the AI Infra field. I’m still learning, so please feel free to point out any mistakes. Looking forward to exchanging ideas with you.

Tech Stack

Mon, 16 Feb 2026 00:00:00 +0000

Introduction to the tech stack used to build this site