Section 01
kernel-set: Unified C ABI High-Performance CUDA Kernel Library for LLM Inference & Training (Main Thread)
Core Overview
kernel-set is a high-performance CUDA kernel library for LLM inference and training, featuring:
- Unified C ABI: Encapsulates 78 core LLM operators, abstracting diverse kernel implementations.
- Multi-language Support: Binds to Python, Rust, Go, TypeScript for cross-language access.
- Auto Optimal Selection: Smart dispatcher chooses best kernel based on GPU architecture, data type, and operator type.
Source Info
- Author/Maintainer: cklxx
- Platform: GitHub
- Original Link: https://github.com/cklxx/kernel-set
- Update Time: 2026-06-05