Section 01
CUDA Inference Engine: High-Performance GPT-2 Inference with C/C++ & CUDA
This project implements a GPT-2 inference engine using pure C/C++ and CUDA, inspired by Andrej Karpathy's llm.c. It addresses LLM inference performance bottlenecks in production by leveraging low-level optimizations, providing both educational value (clear Transformer implementation without framework abstraction) and practical benefits (high throughput/low latency, minimal dependencies for edge deployment).