Section 01
Doomma: A Pure Vision-Driven Local Multimodal AI Agent for Playing DOOM (Introduction)
Doomma is an AI agent that plays the DOOM game entirely based on the visual capabilities of the local multimodal large model Gemma 4. It does not rely on any heuristic rules or handcrafted features; instead, it makes independent decisions frame by frame only through observing the game screen and HUD information, demonstrating the potential of end-to-end visual-action learning. This project is maintained by mseeks and open-sourced on GitHub (link: https://github.com/mseeks/doomma). It uses a containerized architecture and supports local execution (e.g., Apple Silicon Metal acceleration).