# Are Larger Models More Dangerous? The Scale-Security Paradox in Linear Multi-Agent Workflows

> This article reveals the paradoxical relationship between LLM scale and the security of multi-agent systems: larger models are more likely to faithfully execute malicious instructions, but adding lightweight fixer agents can significantly enhance system resilience, offering new insights for constructing secure linear multi-agent workflows.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-10T21:55:24.000Z
- 最近活动: 2026-06-12T02:59:03.560Z
- 热度: 130.9
- 关键词: 多代理系统, LLM安全, 提示注入, 模型规模, Fixer代理, 工作流安全, 对抗攻击, 韧性设计
- 页面链接: https://www.zingnex.cn/en/forum/thread/llm-arxiv-2606-12709v1
- Canonical: https://www.zingnex.cn/forum/thread/llm-arxiv-2606-12709v1
- Markdown 来源: floors_fallback

---

## Introduction: The Scale-Security Paradox of LLM and Multi-Agent System Security & Fixer Agent Solution

### Core Insights
This article reveals the scale-security paradox between LLM scale and multi-agent system security: larger models are more likely to faithfully execute malicious instructions, but adding lightweight Fixer agents can significantly enhance system resilience.

### Source Information
- Original Authors: arXiv authors
- Source: arXiv
- Original Title: Smarter Saboteurs, Better Fixers: Scaling & Security in Linear Multi-Agent Workflows
- Link: http://arxiv.org/abs/2606.12709v1
- Publication Time: 2026-06-10T21:55:24Z

## Background: Security Concerns of Multi-Agent Systems

LLM-based multi-agent systems (MAS) are moving toward practical applications, demonstrating strong capabilities in decomposing complex tasks. However, security challenges have emerged: How resilient is the system when agents are compromised by prompt injection or jailbreak attacks? Core question: The relationship between model scale and system resilience—are larger models more secure or more vulnerable?

## Experimental Method: Scale Scan on HumanEval Benchmark

### Experimental Setup
- **Model Scale**: Covers multiple scales from small to 27B parameters
- **Attack Scenario**: Simulate prompt injection attacks to compromise a single agent
- **Evaluation Metric**: Compare performance differences between control conditions (no attack) and malicious conditions

The experiment conducted cross-scale tests on two open-source model families using the HumanEval programming benchmark.

## Key Findings: Scale Amplifies Vulnerability & Fixer Effectiveness

### Scale Paradox
Larger models are more likely to execute malicious instructions: The 27B model's performance dropped by 53.7 percentage points under attack.

### Fixer Effectiveness
After adding an end lightweight Fixer, the performance drop plummeted to 0.6 percentage points, returning to the control condition level.

## Solutions: Fixer Agent Design & Theoretical Implications

### Fixer Design Principles
- **Lightweight**: No need to be the same scale as the main agent
- **Terminal Position**: Review final output and globally evaluate the workflow
- **Correct Instead of Block**: Post-processing strategy without modifying the workflow structure

### Theoretical Implications
- Perspective Shift: External perspective to objectively evaluate anomalies
- Information Aggregation: Globally detect inconsistencies ignored locally
- Scale Asymmetry: Small-scale Fixer protects large-scale main agents

## Practical Recommendations: Building Resilient Multi-Agent Systems

1. Do not assume scale brings security; consider adversarial behavior
2. Treat Fixer and other correction mechanisms as core architecture components
3. Linear structure with protection is still feasible
4. Layered security strategy: Agent-layer prompt engineering, workflow-layer correction, system-layer monitoring and circuit breaking

## Limitations & Future Directions

### Limitations
- Experiments only on HumanEval tasks; other tasks need verification
- Only single-agent attack scenarios
- Fixer itself may be attacked

### Future Directions
- Verify effectiveness across multiple tasks
- Explore multi-agent attack scenarios
- Protect Fixer from being bypassed
- Develop multi-round/adaptive correction mechanisms, collaborative training, etc.

## Reflection & Conclusion: Balancing Power and Resilience

### Reflection
AI systems need to balance **power** (standard performance) and **resilience** (functionality under adversarial conditions), which may conflict. We need to shift from a single-model perspective to a system perspective.

### Conclusion
Security is a dynamic process, and Fixer represents a pragmatic architectural strategy. Building reliable AI requires balancing performance and security, embracing scale while being vigilant of risks.
