Zing Forum

Reading

Multi-Agent Reinforcement Learning Sailing Race Simulator: Research on Tactical Confrontation Based on America's Cup

A project for the AI course at the University of Bologna, which uses MARL and PPO algorithms to train two boats in confrontation, fully simulating real sailing physics, wind fields, and competition rules.

多智能体强化学习MARL帆船模拟PPO算法PettingZoo物理仿真连续控制博弈对抗
Published 2026-06-03 22:40Recent activity 2026-06-03 22:51Estimated read 7 min
Multi-Agent Reinforcement Learning Sailing Race Simulator: Research on Tactical Confrontation Based on America's Cup
1

Section 01

[Introduction] Multi-Agent Reinforcement Learning Sailing Race Simulator: Research on Tactical Confrontation Based on America's Cup

A project for the AI course at the University of Bologna, which developed a multi-agent reinforcement learning simulator based on the America's Cup sailing race model. This project uses MARL and PPO algorithms to train two boats in confrontation, fully simulating real sailing physics, wind fields, and competition rules, aiming to explore the tactical confrontation capabilities of autonomous agents in dynamic environments. The project is open-sourced on GitHub and maintained by francescofuligni.

2

Section 02

Project Background and Overview

Original Author/Maintainer: francescofuligni; Source Platform: GitHub; Original Title: Multi-agent_America_Cup; Release Date: June 3, 2026. This project is an advanced 2D sailing race simulator that replicates the America's Cup competition model, where two autonomous sailboats (red and blue) confront each other in a randomly changing wind field. The system is built based on MARL, uses the PettingZoo parallel environment interface, and is trained via the PPO algorithm from Stable-Baselines3.

3

Section 03

Competition Structure and Sailing Rules

The race is divided into multiple stages:

  1. Pre-race preparation and starting line crossing: Depart from below the starting line, need to align in the specified area and cross the starting line; violations result in disqualification.
  2. Upwind leg: Sail upwind to the top marker, need to learn to tack (zig-zag) to maximize the effective velocity made good (VMG).
  3. Rounding the mark: After reaching the top, round the buoy to adjust direction.
  4. Downwind leg: Return to the finish line; the first to arrive wins. Rules: Implements physical collision detection (20m physical radius, 40m respect zone) and sailing priority rules (port-tack boats must give way to starboard-tack boats; violations result in penalties or disqualification).
4

Section 04

Physical Simulation and Environment Model

Core physical modeling includes:

  • Polar Performance Plot (VPP): Dynamically calculates the maximum theoretical speed for displacement mode (slow but close to wind direction) and foiling mode (fast but large wind dead angle).
  • Foil mechanics: When the boat speed exceeds 18 knots, it enters foiling mode; when below 15 knots, it returns to displacement mode; state transitions have transient penalties and inertia.
  • Sail adjustment: Uses Gaussian distribution to model the aerodynamic efficiency of the sail; agents need to continuously control the sail angle to maintain optimal propulsion.
  • Spatiotemporal varying wind field: Base wind speed is a random walk between 15-22 knots; a 10×10 spatial grid simulates gusts and wind tunnels, using a mean-reverting stochastic process.
5

Section 05

Technical Implementation Details

The project adopts a modular structure: The core/ directory contains boat_physics.py (speed calculation, kinematics update), sail_trim.py (sail optimization), wind_model.py (wind field model); the env/ directory contains sailing_env.py (state management, rewards, collisions), rendering.py (visualization). Training uses the PPO algorithm from Stable-Baselines3, and the environment is wrapped via SuperSuit to support parallel training acceleration.

6

Section 06

Core Challenges in Reinforcement Learning

The MARL challenges faced by the project:

  1. Continuous action space: Need to control both rudder angle and sail angle simultaneously; actions are continuous rather than discrete.
  2. Partial observability: Agents only perceive local wind fields and opponent positions, with no perfect global information.
  3. Non-stationary environment: The evolution of the opponent's strategy leads to dynamic changes in the environment.
  4. Sparse rewards: Obvious feedback is only received when completing the race or violating rules; intermediate reward design is key.
  5. Multi-stage tasks: Different strategies are needed for different stages (position control, speed optimization, tactical confrontation).
7

Section 07

Practical Significance and Conclusion

Practical significance of the project:

  • Robotic sailing research: Provides a simulation platform for real autonomous sailing control algorithms.
  • MARL algorithm testing: Can serve as a standardized benchmark environment.
  • Sports AI application: Demonstrates the potential of AI in complex sports tactics.
  • Physical simulation education: Helps understand sailing physics and aerodynamics. Conclusion: This project transforms complex sailing competitions into a rigorous RL research platform, promoting the transition of reinforcement learning from games to practical applications, and embodies the value of abstracting real-world complex systems into AI problems.