Zing Forum

Reading

RSI-DNAX: Experimental Exploration of Bounded Recursive Self-Improving Neural Networks

An experimental framework for studying bounded recursive self-improvement mechanisms. Through validation-gated code-level operator evolution, it achieves significant improvements on the ARC-AGI benchmark, demonstrating a feasible path for AI self-improvement in a controlled environment.

recursive self-improvementARC-AGIneural architecture searchmeta-learningAI safetybenchmark evaluationcode evolutioncognitive architectureautomated reasoning
Published 2026-05-19 06:43Recent activity 2026-05-19 06:49Estimated read 8 min
RSI-DNAX: Experimental Exploration of Bounded Recursive Self-Improving Neural Networks
1

Section 01

RSI-DNAX: Guide to Bounded Exploration of Controlled Recursive Self-Improving Neural Networks

RSI-DNAX is an experimental framework for studying bounded recursive self-improvement mechanisms. Through validation-gated code-level operator evolution, it achieves significant improvements on the ARC-AGI benchmark, demonstrating a feasible path for AI self-improvement in a controlled environment. The project is positioned as a non-AGI research scaffold, focusing on auditable bounded improvement cycles, allowing researchers to observe and debug each step of the improvement process.

2

Section 02

Background and Project Positioning

Recursive Self-Improvement (RSI) can theoretically lead to exponential growth in capabilities, but controllability is a practical challenge. RSI-DNAX is not an AGI or singularity proof; its core goal is to build inspectable and understandable bounded improvement cycles: generating restricted operator programs, non-test set validation, rejecting/rolling back failed attempts, freezing accepted states, and reporting results. It is positioned as a CPU-runnable research tool, prioritizing the exploration of the improvement mechanism itself rather than general intelligence.

3

Section 03

Core Architecture and Method Design

Cognitive Core

The "brain" of the system, responsible for task reasoning, memory management, world model construction, and bounded improvement control, coordinating subsystems to ensure operation within constraints.

Adaptive Operator System

The execution layer for self-improvement, including operators and their genome representations, achieving iterative improvement through generating, validating, and selecting operators.

Candidate Generation and Sandbox

The generator performs deterministic mutation and recombination; the sandbox provides an isolated validation environment to prevent failures from affecting the main system, serving as a safety barrier.

Failure Grammar

Records failed candidates and extracts rules to guide subsequent generation and avoid repeated errors, improving exploration efficiency.

Evaluator Evolution

The evaluator undergoes tentative mutations under adversarial checks to ensure evaluation criteria keep up with system development, belonging to meta-level evolution.

4

Section 04

Experimental Results on ARC-AGI Benchmark

In the ARC-AGI-1 isomorphic subset test (gold standard for abstract reasoning):

  • Full mode (seed42): Cell accuracy increased from 0.668 to 1.0 (+33%), exact grid accuracy from 0 to 1;
  • Fast mode: Exact grid accuracy reached 0.4;
  • Cross-seed expansion: Average retained cell accuracy from 0.875 to 0.931, average exact grid accuracy from 0.333 to 0.458. All results are ensured to be credible through anti-cheating checks (data isolation, deterministic replay, dead code detection, etc.).
5

Section 05

Code-level and Architecture-level Self-Improvement Mechanisms

Code-level Improvement

Code-level self-improvement is achieved through operator DSL, generating/modifying operator programs and recursively applying improvement mechanisms (improving both task strategies and the improvement process itself). The HumanEval adapter verifies this capability.

Architecture Evolution

The neural_search module supports deterministic mutation and weight inheritance of architecture genomes; World Model V2 introduces object-centric representation, causal graphs, and counterfactual reasoning, laying the foundation for complex reasoning.

6

Section 06

Anti-cheating and Auditability Guarantees

To ensure credible results, multiple mechanisms are implemented:

  • Data segmentation and isolation: Strict training/validation/test splitting to prevent information leakage;
  • Deterministic replay: All experiments are reproducible;
  • Dead code detection: Exclude the impact of unused code paths;
  • Control strategy audit: Check whether improvements follow safety constraints. These mechanisms provide a reliable foundation for research.
7

Section 07

Limitations and Future Directions

Limitations

  • ARC results are not official leaderboard scores;
  • HumanEval tests do not prove general programming ability;
  • Exact grid accuracy for seed44 remains 0.0, indicating limited gains.

Future Plans

Upgrade interactive residual layers, meta-RSI coordination, and deep architecture while maintaining the principle of bounded auditability.

8

Section 08

Implications for AI Research

The core lessons from RSI-DNAX:

  1. Boundaries are key: Unconstrained improvement is dangerous and difficult to study;
  2. Auditability first: Each improvement step needs to be inspectable and verifiable;
  3. Learn from failure: The failure grammar mechanism effectively utilizes negative experiences;
  4. Multi-level improvement: Multi-dimensional evolution (operators, architecture, etc.) brings compound effects. It serves as a platform for control mechanisms for safety researchers and demonstrates improvement paths for capability researchers, having dual value.