Zing Forum

Reading

AgriSense: An Intelligent Agricultural Disease Detection System Integrating CNN, RAG, and Agentic Workflow

This article introduces how the AgriSense project combines three generative AI paradigms—Convolutional Neural Networks (CNN), Retrieval-Augmented Generation (RAG), and Agentic Workflow—to build an intelligent system for agricultural disease diagnosis and treatment recommendation.

农业AI病害检测RAGAgentic WorkflowResNet50大语言模型智能农业计算机视觉
Published 2026-04-10 13:11Recent activity 2026-04-10 13:23Estimated read 16 min
AgriSense: An Intelligent Agricultural Disease Detection System Integrating CNN, RAG, and Agentic Workflow
1

Section 01

Introduction to AgriSense: An Intelligent Agricultural Disease Detection System Integrating Multiple AI Paradigms

AgriSense is an intelligent agricultural disease detection system that integrates three generative AI paradigms: Convolutional Neural Networks (CNN), Retrieval-Augmented Generation (RAG), and Agentic Workflow. It aims to address the problems of low efficiency and limited coverage in traditional manual diagnosis, providing an end-to-end intelligent solution from disease identification to treatment recommendations.

2

Section 02

Project Background and Research Motivation

Project Background and Research Motivation

Early identification and precise prevention of agricultural diseases are key links in ensuring food security. Traditional manual diagnosis methods rely on expert experience, which have problems such as low identification efficiency and limited coverage. With the development of deep learning and large language model technologies, it has become possible to build an intelligent agricultural disease diagnosis system by combining computer vision and natural language processing technologies.

The AgriSense project was born in this context; it is a course research project that aims to explore how to organically integrate three cutting-edge generative AI paradigms—CNN-based visual recognition, RAG, and Agentic Workflow—to provide an end-to-end intelligent solution for crop disease identification and treatment recommendations.

3

Section 03

System Architecture Design: Three-Layer Modular Collaboration

System Architecture Design

AgriSense adopts a modular three-layer architecture design, where each layer is responsible for different functions and collaborates through clear interfaces:

Visual Recognition Layer: ResNet50-based CNN Model

The system uses a ResNet50 model fine-tuned on the PlantVillage dataset for plant disease classification. PlantVillage is a public dataset containing tens of thousands of plant leaf images, covering common disease types of various crops. After training on a large amount of labeled data, the model can accurately identify lesion features on leaves and output disease category predictions.

As a classic deep residual network, ResNet50 performs stably in image classification tasks; its residual connection design effectively solves the gradient vanishing problem of deep networks, allowing the model to maintain high accuracy even in fine-grained classification tasks like agricultural images.

Knowledge Retrieval Layer: TF-IDF-Driven RAG Architecture

Pure visual recognition can only tell users "what disease this is", but farmers need to know "how to treat it" more. To this end, the system introduces the Retrieval-Augmented Generation (RAG) architecture, combining the generative ability of large language models with domain knowledge bases.

The knowledge base uses Markdown and plain text formats to store professional content related to agricultural disease management, including:

  • Detailed disease descriptions (symptoms, causal factors, susceptible crops)
  • Treatment plans (chemical and organic control methods)
  • Pesticide use guidelines (dosage, application timing, precautions)
  • Crop cultivation management recommendations

The system uses the TF-IDF algorithm to vectorize and index knowledge base documents. When a user queries, it retrieves the top-k most relevant text fragments by calculating the similarity between the query and document blocks. These fragments are injected as context information into the subsequent large language model generation process, ensuring that the output treatment recommendations are evidence-based and effectively reducing the risk of model hallucinations.

Intelligent Decision Layer: Plan-Draft-Reflect Three-Stage Workflow

This is the most innovative design of AgriSense. Instead of being satisfied with a simple single retrieval-generation process, the system introduces the Agentic Workflow mode, gradually optimizing output quality through three stages: Plan→Draft→Reflect:

Plan Stage: The agent first analyzes the user's query and decomposes it into a structured diagnosis strategy. For example, for a question like "Why are my tomato leaves turning yellow and curling?", the system will plan a diagnosis path: "Identify symptom features → Match possible causes → Recommend verification methods → Provide preliminary suggestions".

Draft Stage: Based on the strategy determined in the Plan stage and the relevant knowledge fragments retrieved by RAG, a preliminary consultation response is generated. The response will automatically cite relevant paragraphs from the knowledge base to enhance credibility.

Reflect Stage: The agent self-reviews the output from the Draft stage, checking for factual errors, logical loopholes, or missing key information. If problems are found, a revision mechanism is triggered to regenerate a more accurate response. Although this self-reflection process adds about 1-2 seconds of delay, it can significantly improve the factual accuracy of the answers.

4

Section 04

Technical Implementation Details: Flexible Models and User-Friendly Interface

Technical Implementation Details

Multi-LLM Backend Support

The system design supports flexible model switching:

  • OpenAI API Mode: Uses GPT-4o or GPT-4o-mini, suitable for online environments, providing the strongest generation quality
  • MockLLM Mode: Offline demonstration mode, can run without an API key, suitable for course presentations and network-free environments

This design allows the system to not only exert the strongest performance in production environments but also work normally in teaching and resource-constrained scenarios.

Streaming Interaction Interface

The system builds a concise Web interface based on the Streamlit framework, supporting:

  • Image upload and real-time disease identification
  • Natural language conversational consultation
  • Streaming display of the generation process
  • Transparent display of citation sources

The interface design fully considers the usage habits of farmer users, striving to be simple and intuitive, and lowering the technical threshold.

5

Section 05

Experimental Design and Evaluation Methods: Verifying the Contribution of Each Component

Experimental Design and Evaluation Methods

The project recommends conducting comparative experiments from three dimensions to verify the contribution of each component:

Experiment 1: Baseline LLM (Without Retrieval)

Directly use large language models to answer agricultural questions without injecting any external knowledge. The main observation indicators are hallucination rate and factual correctness.

Experiment 2: RAG-Enhanced (Single Generation)

Introduce TF-IDF retrieval, inject relevant knowledge fragments as context into prompts. Observe the improvement in answer relevance and faithfulness.

Experiment 3: Complete Agentic RAG (Three-Stage Workflow)

Enable the complete Plan-Draft-Reflect process. In addition to the aforementioned indicators, it is also necessary to measure the delay of each request and the improvement in overall factual accuracy.

Recommended evaluation indicators to record include:

  • Answer quality score (manual 1-5 points or automated faithfulness score)
  • Citation correctness (whether the answer accurately cites the retrieved knowledge fragments)
  • Single request delay (seconds)
  • Hallucination rate (percentage of answers containing unvalidated claims)
6

Section 06

Knowledge Base Construction Best Practices: Key to Improving System Performance

Knowledge Base Construction Best Practices

The project documentation specially emphasizes the key impact of knowledge base quality on system performance:

Content Coverage: The knowledge base should cover as comprehensively as possible the common disease types, symptom descriptions, pathogenic mechanisms, and prevention methods of target crops. The more professional and comprehensive the content, the lower the system's hallucination rate.

Retrieval Parameter Tuning: It is recommended to set top_k between 3 and 5. Too small a value may lead to incomplete information coverage, while too large a value will introduce noise and exceed the model's token limit.

Structured Storage: Use Markdown format to store knowledge, and use title levels to help the system better understand the document structure and improve retrieval accuracy.

7

Section 07

Application Scenarios and Social Value: Empowering Agricultural Practitioners and Education

Application Scenarios and Social Value

The design goal of AgriSense is to provide agricultural practitioners with a reliable intelligent consultation assistant:

For small farmers: Lower the threshold for accessing professional agricultural knowledge, enabling them to obtain timely and accurate disease diagnosis and treatment recommendations even without on-site expert guidance.

For agricultural technology extension personnel: As an auxiliary tool, it helps quickly identify diseases, query prevention and control plans, and improve service efficiency.

For agricultural education: As a teaching case, it demonstrates how to integrate multiple AI technologies to solve practical agricultural problems and cultivate students' cross-technology integration capabilities.

8

Section 08

Technical Insights and Outlook: Cross-Paradigm Integration for Vertical Domain Applications

Technical Insights and Outlook

The AgriSense project demonstrates an important trend in the application of AI technology in vertical domains: a single technology is often difficult to solve complex practical problems, and multiple paradigms need to be organically combined. Computer vision is responsible for "seeing", RAG for "knowing", and Agentic Workflow for "thinking"; only through the collaboration of the three can truly useful intelligent services be provided.

This architectural design idea is not only applicable to agricultural disease detection but can also be extended to other fields that require multi-modal input and knowledge-intensive reasoning, such as medical diagnosis, industrial quality inspection, and legal consultation. In the future, with the further development of multi-modal large models and tool usage capabilities, the intelligence level of such systems will continue to improve.