Section 01
Ternative: A New Lightweight Inference Engine Option for Ternary-Weight LLMs (Introduction)
Ternative is an inference engine designed specifically for ternary-weight large language models (LLMs). It supports runtime LoRA loading, enabling efficient inference with extremely low resource consumption, and is hailed as the 'llama.cpp for BitNet models'. It fills the gap of mature inference engines in the ternary-weight model ecosystem, providing a new option for resource-constrained scenarios such as edge computing.