Section 01
LLM-Para Framework Overview: A Performance Analysis Tool for LLM Inference on Heterogeneous Multi-Level Memory
LLM-Para is a multi-metric first-order Roofline analysis framework aimed at solving performance analysis problems for large language model (LLM) inference on heterogeneous multi-level memory architectures. It supports modern LLM architectures such as GQA, MoE, and MLA, covers 24 hardware platforms, and provides multi-objective design space exploration capabilities to help users perform trade-off analysis across dimensions like performance, energy consumption, total cost of ownership (TCO), and carbon footprint.