Reading

AgriSense: An Intelligent Agricultural Disease Detection System Integrating CNN, RAG, and Agentic Workflow

This article introduces how the AgriSense project combines three generative AI paradigms—Convolutional Neural Networks (CNN), Retrieval-Augmented Generation (RAG), and Agentic Workflow—to build an intelligent system for agricultural disease diagnosis and treatment recommendation.

农业AI病害检测RAGAgentic WorkflowResNet50大语言模型智能农业计算机视觉

Published 2026-04-10 13:11Recent activity 2026-04-10 13:23Estimated read 16 min

AgriSense: An Intelligent Agricultural Disease Detection System Integrating CNN, RAG, and Agentic Workflow

Section 01

Introduction to AgriSense: An Intelligent Agricultural Disease Detection System Integrating Multiple AI Paradigms

AgriSense is an intelligent agricultural disease detection system that integrates three generative AI paradigms: Convolutional Neural Networks (CNN), Retrieval-Augmented Generation (RAG), and Agentic Workflow. It aims to address the problems of low efficiency and limited coverage in traditional manual diagnosis, providing an end-to-end intelligent solution from disease identification to treatment recommendations.

Section 02

Project Background and Research Motivation

Early identification and precise prevention of agricultural diseases are key links in ensuring food security. Traditional manual diagnosis methods rely on expert experience, which have problems such as low identification efficiency and limited coverage. With the development of deep learning and large language model technologies, it has become possible to build an intelligent agricultural disease diagnosis system by combining computer vision and natural language processing technologies.

The AgriSense project was born in this context; it is a course research project that aims to explore how to organically integrate three cutting-edge generative AI paradigms—CNN-based visual recognition, RAG, and Agentic Workflow—to provide an end-to-end intelligent solution for crop disease identification and treatment recommendations.

Section 03

System Architecture Design: Three-Layer Modular Collaboration

System Architecture Design

AgriSense adopts a modular three-layer architecture design, where each layer is responsible for different functions and collaborates through clear interfaces:

Visual Recognition Layer: ResNet50-based CNN Model

The system uses a ResNet50 model fine-tuned on the PlantVillage dataset for plant disease classification. PlantVillage is a public dataset containing tens of thousands of plant leaf images, covering common disease types of various crops. After training on a large amount of labeled data, the model can accurately identify lesion features on leaves and output disease category predictions.

As a classic deep residual network, ResNet50 performs stably in image classification tasks; its residual connection design effectively solves the gradient vanishing problem of deep networks, allowing the model to maintain high accuracy even in fine-grained classification tasks like agricultural images.

Knowledge Retrieval Layer: TF-IDF-Driven RAG Architecture

Pure visual recognition can only tell users "what disease this is", but farmers need to know "how to treat it" more. To this end, the system introduces the Retrieval-Augmented Generation (RAG) architecture, combining the generative ability of large language models with domain knowledge bases.

The knowledge base uses Markdown and plain text formats to store professional content related to agricultural disease management, including:

Detailed disease descriptions (symptoms, causal factors, susceptible crops)
Treatment plans (chemical and organic control methods)
Pesticide use guidelines (dosage, application timing, precautions)
Crop cultivation management recommendations

The system uses the TF-IDF algorithm to vectorize and index knowledge base documents. When a user queries, it retrieves the top-k most relevant text fragments by calculating the similarity between the query and document blocks. These fragments are injected as context information into the subsequent large language model generation process, ensuring that the output treatment recommendations are evidence-based and effectively reducing the risk of model hallucinations.

Intelligent Decision Layer: Plan-Draft-Reflect Three-Stage Workflow

This is the most innovative design of AgriSense. Instead of being satisfied with a simple single retrieval-generation process, the system introduces the Agentic Workflow mode, gradually optimizing output quality through three stages: Plan→Draft→Reflect:

Plan Stage: The agent first analyzes the user's query and decomposes it into a structured diagnosis strategy. For example, for a question like "Why are my tomato leaves turning yellow and curling?", the system will plan a diagnosis path: "Identify symptom features → Match possible causes → Recommend verification methods → Provide preliminary suggestions".

Draft Stage: Based on the strategy determined in the Plan stage and the relevant knowledge fragments retrieved by RAG, a preliminary consultation response is generated. The response will automatically cite relevant paragraphs from the knowledge base to enhance credibility.

Reflect Stage: The agent self-reviews the output from the Draft stage, checking for factual errors, logical loopholes, or missing key information. If problems are found, a revision mechanism is triggered to regenerate a more accurate response. Although this self-reflection process adds about 1-2 seconds of delay, it can significantly improve the factual accuracy of the answers.

Section 04

Technical Implementation Details: Flexible Models and User-Friendly Interface

Technical Implementation Details

Multi-LLM Backend Support

The system design supports flexible model switching:

OpenAI API Mode: Uses GPT-4o or GPT-4o-mini, suitable for online environments, providing the strongest generation quality
MockLLM Mode: Offline demonstration mode, can run without an API key, suitable for course presentations and network-free environments

This design allows the system to not only exert the strongest performance in production environments but also work normally in teaching and resource-constrained scenarios.

Streaming Interaction Interface

The system builds a concise Web interface based on the Streamlit framework, supporting:

Image upload and real-time disease identification
Natural language conversational consultation
Streaming display of the generation process
Transparent display of citation sources

The interface design fully considers the usage habits of farmer users, striving to be simple and intuitive, and lowering the technical threshold.

Section 05

Experimental Design and Evaluation Methods: Verifying the Contribution of Each Component

Experimental Design and Evaluation Methods

The project recommends conducting comparative experiments from three dimensions to verify the contribution of each component:

Experiment 1: Baseline LLM (Without Retrieval)

Directly use large language models to answer agricultural questions without injecting any external knowledge. The main observation indicators are hallucination rate and factual correctness.

Experiment 2: RAG-Enhanced (Single Generation)

Introduce TF-IDF retrieval, inject relevant knowledge fragments as context into prompts. Observe the improvement in answer relevance and faithfulness.

Experiment 3: Complete Agentic RAG (Three-Stage Workflow)

Enable the complete Plan-Draft-Reflect process. In addition to the aforementioned indicators, it is also necessary to measure the delay of each request and the improvement in overall factual accuracy.

Recommended evaluation indicators to record include:

Answer quality score (manual 1-5 points or automated faithfulness score)
Citation correctness (whether the answer accurately cites the retrieved knowledge fragments)
Single request delay (seconds)
Hallucination rate (percentage of answers containing unvalidated claims)

Section 06

Knowledge Base Construction Best Practices: Key to Improving System Performance

Knowledge Base Construction Best Practices

The project documentation specially emphasizes the key impact of knowledge base quality on system performance:

Content Coverage: The knowledge base should cover as comprehensively as possible the common disease types, symptom descriptions, pathogenic mechanisms, and prevention methods of target crops. The more professional and comprehensive the content, the lower the system's hallucination rate.

Retrieval Parameter Tuning: It is recommended to set top_k between 3 and 5. Too small a value may lead to incomplete information coverage, while too large a value will introduce noise and exceed the model's token limit.

Structured Storage: Use Markdown format to store knowledge, and use title levels to help the system better understand the document structure and improve retrieval accuracy.

Section 07

Application Scenarios and Social Value: Empowering Agricultural Practitioners and Education

Application Scenarios and Social Value

The design goal of AgriSense is to provide agricultural practitioners with a reliable intelligent consultation assistant:

For small farmers: Lower the threshold for accessing professional agricultural knowledge, enabling them to obtain timely and accurate disease diagnosis and treatment recommendations even without on-site expert guidance.

For agricultural technology extension personnel: As an auxiliary tool, it helps quickly identify diseases, query prevention and control plans, and improve service efficiency.

For agricultural education: As a teaching case, it demonstrates how to integrate multiple AI technologies to solve practical agricultural problems and cultivate students' cross-technology integration capabilities.

Section 08

Technical Insights and Outlook: Cross-Paradigm Integration for Vertical Domain Applications

Technical Insights and Outlook

The AgriSense project demonstrates an important trend in the application of AI technology in vertical domains: a single technology is often difficult to solve complex practical problems, and multiple paradigms need to be organically combined. Computer vision is responsible for "seeing", RAG for "knowing", and Agentic Workflow for "thinking"; only through the collaboration of the three can truly useful intelligent services be provided.

This architectural design idea is not only applicable to agricultural disease detection but can also be extended to other fields that require multi-modal input and knowledge-intensive reasoning, such as medical diagnosis, industrial quality inspection, and legal consultation. In the future, with the further development of multi-modal large models and tool usage capabilities, the intelligence level of such systems will continue to improve.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15