Section 01
llm-benchmark: Guide to the Personal LLM Model Evaluation Framework
llm-benchmark is an open-source personal LLM evaluation suite that supports comparison between Ollama local models and API models like Anthropic Claude and OpenAI GPT. It covers multi-dimensional test tasks including programming, reasoning, knowledge Q&A, output format compliance, and speed performance. The project emphasizes personalized customization (custom datasets, scenarios, hardware environments) to help users solve LLM selection dilemmas and provides an extensible performance evaluation tool.