Zing Forum

Reading

WebChallenger: Achieving Efficient and Universal Web Agents Through Architectural Innovation

WebChallenger achieves performance close to proprietary systems on open-source models through PageMem structured page representation and three cognitive mechanisms, with significantly reduced costs

Web智能体自主导航PageMem开源模型自动化智能体架构网页理解
Published 2026-06-09 12:53Recent activity 2026-06-10 09:19Estimated read 4 min
WebChallenger: Achieving Efficient and Universal Web Agents Through Architectural Innovation
1

Section 01

WebChallenger: Guide to Efficient and Universal Web Agents Driven by Architectural Innovation

WebChallenger achieves performance close to proprietary systems on open-source models through PageMem structured page representation and three cognitive mechanisms, with significantly reduced costs. The framework has been open-sourced, providing a reusable technical foundation for the development of universal Web agents.

2

Section 02

Practical Dilemmas of Web Agents and Lack of Cognitive Advantages

Autonomous web navigation is a core challenge for LLM agents. Current systems rely on proprietary models with excessively high costs; existing architectures lack three key cognitive advantages of humans:

  1. Selective attention: Focus on task-related areas
  2. Persistent memory: Accumulate website structure knowledge
  3. Procedural proficiency: Automate common interaction patterns
3

Section 03

WebChallenger Architecture Design: PageMem and Three Cognitive Mechanisms

PageMem Semantic Representation

Structured pages built from DOM, features:

  • Deterministic generation
  • Semantic partitioning (navigation bar/content area, etc.)
  • Hierarchical summarization

Three Cognitive Mechanisms

  1. Divide-and-conquer observation: First view partition summaries then extract details
  2. Lightweight memory system: Build a reusable map with one traversal
  3. Composite action flow: Encapsulate multi-step interactions into a single action
4

Section 04

WebChallenger Performance Benchmark Results

Performance of open-source models on authoritative benchmarks:

Benchmark Score Description
WebArena 56.3% Real website tasks
VisualWebArena 48.7% Visual enhancement tasks
Online-Mind2Web 51.0% Multi-step tasks
WorkArena 70.9% Office scenario tasks

The performance is close to proprietary systems, with lower costs and cross-site generalization without adapters

5

Section 05

WebChallenger Technical Insights and Value

Key principles:

  1. Architecture over scale: Open-source models approach proprietary performance through architecture
  2. Cognitively inspired design: Draw on human attention/memory/proficiency
  3. Reusable generalization: PageMem enables cross-site knowledge reuse to reduce costs
6

Section 06

Practical Application Scenarios of WebChallenger

Application scenarios:

  • Automated testing: Verify website functions without scripts
  • Data collection: Automatically extract structured data from multiple websites
  • Office assistance: Complete cross-system repetitive Web operations
  • Accessibility: Automate interactions for visually impaired users
7

Section 07

WebChallenger Open-Source Contributions and Community Impact

Already open-sourced (GitHub), promoting:

  • Research community to explore universal Web agents
  • Industry to build practical systems
  • Educational field for agent teaching demonstrations