# PestLife: A Lifecycle-Aware Evaluation Benchmark for Multimodal Large Models in Rice Pest Management

> The team from South China Agricultural University released the PestLife benchmark, which for the first time integrates pest growth stage recognition into the evaluation system of multimodal large models. Through a three-level hierarchical framework, they systematically evaluated 39 SOTA models and revealed that stage recognition is a significant bottleneck in current agricultural AI.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-05-14T08:54:18.000Z
- 最近活动: 2026-05-14T08:58:59.101Z
- 热度: 141.9
- 关键词: 多模态大模型, 农业AI, 水稻虫害, 生命周期感知, 基准评测, 计算机视觉, 智慧农业, 害虫识别
- 页面链接: https://www.zingnex.cn/en/forum/thread/pestlife
- Canonical: https://www.zingnex.cn/forum/thread/pestlife
- Markdown 来源: floors_fallback

---

## PestLife Benchmark Guide: The First Multimodal Evaluation Focused on Lifecycle Awareness of Rice Pests

The team from South China Agricultural University released the PestLife benchmark, which is the first to integrate pest growth stage recognition into the evaluation system of multimodal large models. Through a three-level hierarchical framework, they systematically evaluated 39 SOTA models and revealed that stage recognition is a significant bottleneck in current agricultural AI. This benchmark aims to bridge the gap between model evaluation and actual agricultural application needs, promoting precise pest management in smart agriculture.

## Research Background: Pain Points in Rice Pest Control and Limitations of Existing Evaluations

Global rice cultivation faces severe pest threats. Precise control requires identifying pest species and their growth stages (since harm levels and control strategies vary greatly across stages). Currently, relying on expert experience is inefficient and difficult to popularize at the grassroots level. Multimodal large models provide a path for agricultural intelligence, but existing evaluations simplify pest recognition into a single classification task, ignoring lifecycle awareness, leading to a disconnect between model performance in labs and field applications.

## Core Innovations of the PestLife Benchmark and Dataset Construction

The core innovations of PestLife include three capability dimensions (Species Recognition S, Stage Recognition T, Knowledge Application K) and a three-level hierarchical evaluation framework (Level1 single capability, Level2 dual capability combination, Level3 end-to-end integration). The dataset construction is rigorous: 1195 images of 35 rice pest species were collected from multiple channels (after clustering for redundancy removal and expert verification), 12305 question-answer pairs were generated (with hierarchical design, multi-stage filtering, and expert validation), and a crowdsourcing mechanism for continuous expansion was established.

## Experimental Findings: Stage Recognition is a Bottleneck, and General Models Lack Adaptability

Zero-shot evaluation of 39 SOTA models (32 multimodal +7 pure text) found: 1. Stage recognition (T) is a significant bottleneck—even models with excellent species recognition have low accuracy in stage judgment; 2. General vision-language models are not necessarily leading in fine-grained agricultural tasks; 3. The problem of error accumulation in end-to-end reasoning is prominent. Control experiments verified that the lack of stage information significantly reduces the accuracy of pest control recommendations.

## Technical Value and Application Prospects of PestLife

PestLife fills the gap in evaluation for the agricultural vertical domain by incorporating lifecycle awareness for the first time; evaluation results can guide the assessment of the capability boundaries of agricultural AI products; the three-level framework can be migrated to other crop disease and pest scenarios. The team has made the dataset and code public, encouraging the community to expand and optimize them.

## Conclusion: Paradigm Shift in Agricultural AI Evaluation

PestLife marks the shift of agricultural AI evaluation from 'species recognition' to 'lifecycle awareness'. Current multimodal models still have room for improvement in fine-grained agricultural visual tasks. Benchmarks that are close to real scenarios will serve as a bridge between academia and industry, facilitating the digital transformation of agriculture.
