Section 01
[Introduction] LLM Optimization Strategies Under Compute Budget Constraints: A Study on the Trade-off Between Fine-tuning and Inference Expansion
This project focuses on optimization strategies for small language models under a fixed compute budget, with the core exploration of the optimal trade-off between investing resources in one-time fine-tuning training or inference-time expansion (e.g., self-consistency reasoning). Through experiments on the GSM8K mathematical reasoning benchmark, combined with LoRA fine-tuning, synthetic data generation, and integration of inference strategies, it aims to provide quantitative decision-making support for model deployment in cost-sensitive scenarios and map the (cost-accuracy) Pareto frontier.