Section 01
LeanLLM: A Concise and Correct Layer-wise Reference Implementation for Gemma4
LeanLLM is a lightweight inference engine for Google's Gemma4 model. Its core feature is achieving extremely low memory usage through a layer-wise loading strategy, solving compatibility issues that mainstream inference engines (such as vLLM and llama.cpp) encounter when adapting to Gemma4's new architecture. It provides a 'concise and correct' reference implementation that balances educational value and practicality.