章节 01
AI Evaluation App: A Data-Driven Framework for LLM Selection
This project presents a production-level LLM evaluation framework using NBA datasets to enable data-driven model selection. It addresses the problem of subjective 'vibes-based testing' by leveraging RAG pipelines, LLM-as-a-Judge scoring, and multi-dimensional KPIs to compare local (Qwen, Gemma via Ollama) and cloud (Gemini via Google API) models in sports analysis scenarios.