Zing Forum

Reading

MCPP: A Constraint-Driven Online Resource Allocation Framework for Agentic Workflows

MCPP (Monte Carlo Portfolio Policy) is a resource allocation system for agentic workflows. It achieves optimal resource scheduling under time and budget constraints through Bayesian memory evolution guided by active inference and Monte Carlo portfolio strategies.

MCPP智能体工作流资源分配主动推理贝叶斯记忆持续学习蒙特卡洛约束优化LLMCodeFlow
Published 2026-06-11 15:45Recent activity 2026-06-11 15:53Estimated read 7 min
MCPP: A Constraint-Driven Online Resource Allocation Framework for Agentic Workflows
1

Section 01

Introduction / Main Floor: MCPP: A Constraint-Driven Online Resource Allocation Framework for Agentic Workflows

MCPP (Monte Carlo Portfolio Policy) is a resource allocation system for agentic workflows. It achieves optimal resource scheduling under time and budget constraints through Bayesian memory evolution guided by active inference and Monte Carlo portfolio strategies.

2

Section 02

Original Author and Source

  • Original Author/Maintainer: Wang Xinglin (WangXinglin)
  • Source Platform: GitHub
  • Original Title: MCPP: On Time, Within Budget: Constraint-Driven Online Resource Allocation for Agentic Workflows
  • Original Link: https://github.com/WangXinglin/MCPP
  • Publication Date: June 11, 2026

3

Section 03

Research Background and Problem Definition

With the rapid development of Large Language Model (LLM) agents, how to efficiently manage and allocate computing resources has become a key challenge. Agentic workflows usually involve multi-step chained calls, parallel execution, and conditional branches, where each step may consume different amounts of time and computing costs.

In practical deployment, agent systems often face two core constraints:

  1. Time Constraint (Deadline): Tasks must be completed within the specified time
  2. Budget Constraint: The total cost of task execution cannot exceed the preset upper limit

Traditional resource allocation methods usually adopt static strategies and cannot dynamically adjust based on real-time execution status. The MCPP framework proposes an online resource allocation method based on Active Inference and Bayesian memory evolution, which can maximize task success rate while satisfying constraints.


4

Section 04

Active Inference Framework

Active Inference is a theoretical framework from cognitive neuroscience that unifies perception and action under an optimization goal of minimizing free energy. In MCPP, this framework is used to guide agents on how to make optimal decisions in uncertain environments.

The core idea is: Agents not only passively perceive the environment but also actively seek evidence to verify or revise their internal world models. This "active" feature enables the system to:

  • Predict future states and take actions in advance
  • Prioritize high-value tasks when resources are limited
  • Learn from past execution results and update strategies
5

Section 05

Bayesian Memory Evolution

MCPP introduces a Bayesian memory evolution mechanism to solve the forgetting problem in Continual Learning. Traditional neural networks are prone to "catastrophic forgetting" when continuously learning new tasks, meaning that learning new tasks impairs the performance of already learned tasks.

Bayesian memory evolution solves this problem in the following ways:

  • Probabilistic Representation: Represent memory as a probability distribution instead of deterministic weights
  • Bayesian Update: Use Bayesian rules to integrate new experiences and maintain the probability distribution of old knowledge
  • Memory Evolution: Allow memory structure to evolve over time to adapt to changing execution environments

6

Section 06

Core Strategy

The core of MCPP is a portfolio strategy based on Monte Carlo sampling. Unlike traditional methods, it does not select a model for each task individually but constructs a model portfolio, and finds the optimal resource allocation scheme through random sampling and evaluation.

The specific process includes:

  1. Rollout Collection: Perform multiple execution samplings for each task node, collect statistical information such as latency, success rate, and cost
  2. DAG Pool Construction: Convert sampling results into a Directed Acyclic Graph (DAG) pool, where each DAG represents a possible execution plan
  3. Multi-Model Alignment: When using multiple models, construct an aligned multi-model DAG pool for portfolio experiments
  4. Strategy Evaluation: Run the Monte Carlo portfolio strategy (mc_portfolio_rollout) and baseline strategies such as uniform, sequential, and random
  5. Result Merging: Merge sharded outputs to generate final experimental results
7

Section 07

Constraint-Driven Resource Allocation

The key innovation of MCPP lies in explicitly integrating constraints (time and budget) into the decision-making process:

  • Budget Awareness: Each decision considers the remaining budget to avoid overspending
  • Deadline Awareness: Prioritize scheduling time-sensitive tasks to ensure on-time completion
  • Online Adaptation: Dynamically adjust resource allocation based on actual execution progress

8

Section 08

Experimental Benchmarks and Datasets

The MCPP framework was validated on two benchmark datasets: