Section 01
[Introduction] Pathology LLM Benchmarks Are Underestimated: Input Configuration Optimization Upends Traditional Perceptions
The core argument of this article: The "underperformance" of general-purpose LLMs in pathology tasks does not stem from insufficient model capabilities, but from suboptimal input configuration choices. By optimizing design aspects like tile size and magnification (e.g., large tiles + low magnification + joint processing), GPT-5's accuracy in cancer classification tasks jumped from 15.1% to 39.5%, challenging the traditional perception of the necessity of specialized models.