Zing Forum

Reading

AgriChain: An Expert-Verified Reasoning Dataset for Interpretable Agricultural Vision-Language Models

This article introduces the AgriChain dataset, which contains approximately 11,000 expert-curated plant leaf images. Each image is accompanied by a disease label, confidence score, and expert-verified chain-of-thought reasoning. The AgriChain-VL3B model fine-tuned on this dataset outperforms strong baselines like Gemini and GPT-4o in plant disease diagnosis.

AgriChain农业视觉语言模型植物病害诊断思维链推理可解释AI专家验证可持续农业领域专门化
Published 2026-04-09 13:13Recent activity 2026-04-10 10:23Estimated read 5 min
AgriChain: An Expert-Verified Reasoning Dataset for Interpretable Agricultural Vision-Language Models
1

Section 01

AgriChain Dataset: An Expert-Verified Reasoning Resource for Interpretable Agricultural Vision-Language Models

This article introduces the AgriChain dataset, which includes approximately 11,000 expert-curated plant leaf images. Each image is equipped with a disease label, confidence score, and expert-verified chain-of-thought reasoning. The AgriChain-VL3B model fine-tuned on this dataset outperforms strong baselines such as Gemini and GPT-4o in plant disease diagnosis, providing critical support for interpretable agricultural AI.

2

Section 02

Core Challenges Facing Agricultural AI: Accuracy and Interpretability

Globally, 20%-40% of crop yields are lost to pests and diseases each year, and professional pathologists are scarce. General-purpose Vision-Language Models (VLMs) have two major issues in agricultural applications: 1. Lack of specialized training for agricultural scenarios, making it difficult to identify subtle disease features; 2. Black-box predictions lack interpretability, making it hard to gain farmers' trust.

3

Section 03

AgriChain Dataset Construction and Model Fine-Tuning Methods

The AgriChain dataset contains 11,000 leaf images, with annotations including disease labels, calibrated confidence scores, and expert-verified chain-of-thought reasoning. Annotation generation uses human-machine collaboration: GPT-4o generates drafts, which are reviewed and revised by agricultural engineers to ensure professional consistency. The AgriChain-VL3B model, fine-tuned on Qwen2.5-VL-3B, performs disease classification and reasoning generation simultaneously through multi-task learning, improving accuracy and interpretability.

4

Section 04

Experimental Results: AgriChain-VL3B Outperforms General-Purpose Large Models and Has High Interpretability

On the test set, AgriChain-VL3B achieved a Top-1 accuracy of 73.1%, a macro-average F1 score of 0.466, and a weighted F1 score of 0.655, significantly outperforming general-purpose models like Gemini 1.5 Flash, Gemini 2.5 Pro, and GPT-4o Mini. The reasoning explanations it generates are highly aligned with expert reasoning, stably citing key visual clues, and have both credibility and educational value.

5

Section 05

Technical Contributions and Significance for Sustainable Agriculture

Technical contributions: 1. One of the first large-scale agricultural VL datasets with expert chain-of-thought annotations; 2. The human-machine collaborative annotation process enables scalable domain knowledge acquisition; 3. Proves that specialized fine-tuning can outperform general-purpose large models. Significance for sustainable agriculture: Reduces pesticide abuse, promotes agricultural knowledge dissemination, lowers AI entry barriers, and drives technology inclusivity.

6

Section 06

Current Limitations and Future Research Directions

Limitations: The dataset is mainly leaf-focused, with limited coverage of crop parts; it is highly targeted to specific regional climates, and cross-regional generalization needs to be verified. Future directions: Expand the dataset to more crop parts and types; integrate multi-modal data (images + sensors + meteorology); develop mobile applications for farmers' convenience.