Zing Forum

Reading

Thinking Agents: A Goal-Oriented Multi-Agent System Based on Graph Networks and Active Inference

An agent platform integrating RAG, graph neural networks, and active inference, enabling goal-oriented autonomous planning and experience reuse through decision graphs

多智能体系统RAG图神经网络主动推理目标导向LLM评判决策图谱经验复用ReactFlask
Published 2026-03-30 13:46Recent activity 2026-03-30 13:56Estimated read 9 min
Thinking Agents: A Goal-Oriented Multi-Agent System Based on Graph Networks and Active Inference
1

Section 01

Thinking Agents Project Overview: Core Innovations of a Goal-Oriented Multi-Agent System

Thinking Agents: A Goal-Oriented Multi-Agent System Based on Graph Networks and Active Inference

Developed by Marcus Anderson, this project integrates Retrieval-Augmented Generation (RAG), graph neural networks, and active inference theory to build a goal-oriented multi-agent platform. The core innovation lies in the decision graph mechanism, which records agents' decision paths to form reusable knowledge assets. It addresses the limitations of traditional LLM agents—lack of goal orientation, planning ability, and experience accumulation—enabling autonomous planning and cross-task experience reuse.

Project link: https://github.com/maracman/thinking-agents

2

Section 02

Dilemmas of LLM Agents: Challenges from Dialogue to Action

Dilemmas of LLM Agents: Challenges from Dialogue to Action

Traditional LLM-driven agents excel at dialogue but have fundamental limitations:

  • Lack of true goal orientation, making it hard to complete specific tasks;
  • Lack of systematic planning ability, easily getting lost in complex problems;
  • Unable to accumulate experience from failures, repeating mistakes.

The core issue is the absence of effective mechanisms to manage goals, evaluate progress, and adjust strategies. In contrast, humans naturally decompose tasks, set subgoals, and backtrack to adjust—this is a key challenge for AI to achieve autonomy.

3

Section 03

Innovative Architecture: Hierarchical Intelligence and Decision Loop

Innovative Architecture of Thinking Agents

The system adopts a layered intelligence design with core mechanisms including:

  1. Goal-oriented agent loop: The cognitive core, with the process: check decision graph → generate subgoals → execute actions → LLM evaluate progress;
  2. LLM evaluation mechanism: A meta-cognitive component that scores actions (1-7 points) via lightweight LLM, triggering Go/NoGo decisions to avoid self-assessment bias;
  3. Graph intelligence engine: The foundation for memory and learning, encoding Go/NoGo decisions as graph edges (weights reflect attempt counts), and enabling experience reuse through semantic embedding and shortest path algorithms.
4

Section 04

Decision Graph: Experience Reuse and Optimal Path Search

Decision Graph: From Experience to Intelligence

The decision graph is the system's most distinctive design. Unlike traditional RAG that retrieves document fragments, it retrieves successful decision paths:

  • Semantic node embedding: Uses the all-MiniLM-L6-v2 model to embed node labels into a semantic space, searching for relevant historical nodes via cosine similarity;
  • Weighted path search: Uses NetworkX shortest path algorithm, combining historical attempt counts (persistence_count) and failure penalties (NoGo edge weight ×10) to find optimal paths;
  • Graph fusion and transfer: Supports cross-agent graph import and merging to form a shared knowledge base;
  • Similarity links: Automatically detects semantically similar nodes (cosine similarity >0.8) after merging and establishes low-cost connections.
5

Section 05

Cognitive Science Foundation: Application of Active Inference Theory

Cognitive Science Foundation: Active Inference Theory Application

The system's design is rooted in Karl Friston's active inference theory (agents minimize prediction errors through actions):

  • Predictive goal setting: Proactively predicts paths to achieve goals, driving subgoal generation and path selection;
  • Error-driven adaptation: When LLM scores are below expectations, it is treated as a prediction error, triggering strategy adjustments (NoGo decisions);
  • Free energy minimization: Prioritizes paths with high historical success rates (low-weight edges) to reduce cognitive effort.
6

Section 06

Technical Implementation Highlights: Full-Stack AI Application Architecture

Technical Implementation Highlights

The system's tech stack reflects modern full-stack AI application features:

  • Multi-provider LLM support: Abstracts interfaces for OpenAI/Anthropic/Cohere/HuggingFace/local GGUF models, supporting automatic fallback and retry;
  • Front-end and back-end separation: Front-end uses React 17 + Webpack 5, back-end uses Flask + Waitress to provide WSGI services;
  • Interactive visualization: Generates HTML graphs via PyVis, embedded in the front-end via iframe to intuitively display decision processes;
  • Persistence management: Session save/load/copy/delete, decision graphs stored in JSON format to ensure knowledge continuity.
7

Section 07

Application Scenarios and Value: From Complex Problem-Solving to Knowledge Inheritance

Application Scenarios and Value

The system applies to various complex scenarios:

  • Complex problem solving: Avoids repeated mistakes via decision graphs and accumulates experience in solving specific problems;
  • Multi-agent collaboration: Multiple agents explore different paths, merging graphs to form a comprehensive knowledge base;
  • Knowledge inheritance: New agents import experience graphs to quickly gain domain capabilities;
  • Dialogue games and narrative AI: Encodes plot paths to enhance NPC behavior coherence.
8

Section 08

Limitations and Future Improvement Directions

Limitations and Future Directions

The current system has the following limitations and improvement directions:

  1. Subjective evaluation mechanism: LLM-based scoring may be inconsistent; need to introduce objective metrics or multi-judge consensus;
  2. Insufficient graph semantic expression: Sentence embedding cannot capture complex contexts; more advanced GNN architectures can be adopted;
  3. Slow learning speed: Dependent on actual interaction次数; can combine simulation environments for fast self-play learning.