Zing Forum

Reading

Mind of Tashi: A Psychological Game Duel with Small-Scale Reasoning Models

Mind of Tashi is a competitive game based on the blind commitment mechanism, where players engage in psychological battles with a fine-tuned small Mixture of Experts (MoE) reasoning model (approximately 200 million active parameters). The project demonstrates how to use small local models to implement complex recursive reasoning adversarial interactions and run on edge devices via llama.cpp without relying on cloud APIs.

小型模型推理模型MoEllama.cpp游戏AI心理博弈模型微调GitHub
Published 2026-06-09 14:14Recent activity 2026-06-09 14:21Estimated read 11 min
Mind of Tashi: A Psychological Game Duel with Small-Scale Reasoning Models
1

Section 01

Introduction: Mind of Tashi - A Psychological Game Duel with Small-Scale Reasoning Models

Project Basic Information

Core Points Mind of Tashi is a competitive game based on the blind commitment mechanism, where players engage in psychological battles with a fine-tuned small MoE reasoning model (approximately 200 million active parameters). The project runs on edge devices via llama.cpp without cloud APIs, demonstrating the possibility of small local models implementing complex recursive reasoning adversarial interactions. Set in a ninja monk village in the Himalayas, players need to climb a trial tower guarded by AI; the core lies in predicting the AI opponent, which narrates its own thinking process.

2

Section 02

Project Background and Core Mechanisms

Project Background This project is an entry for the second track "An Adventure in Thousand Token Wood" of the Build Small Hackathon. The game is set in a ninja monk village shrouded in mist in the Himalayas, where the player's goal is to climb a trial tower guarded by AI opponents.

Core Mechanisms The core of the game is the blind commitment duel: in each round, the player and AI secretly choose moves simultaneously, with no reaction time—relying solely on prediction. After the AI makes a move, it reveals its interpretation of the player's behavior (e.g., "You took two unpunished breaths—greed, so I attack"). The essence of the game lies in recursive thinking (e.g., "I think you will attack, so I use Mist-Step; I think you think this way, so I take a breath"), which is exactly the area where reasoning models excel.

3

Section 03

Model Architecture and Technical Implementation

Model Architecture The AI opponent uses a custom Mixture of Experts (MoE) model: total parameters are approximately 400 million, with only about 200 million active parameters per token. Trained via SFT (Supervised Fine-Tuning) and GRPO, it supports code-switching between English + Hindi/Sanskrit (IAST transliteration) styles, and is 10-100 times smaller than cutting-edge API models (in terms of active parameters). The model is distributed in Q4_K_M GGUF format and runs via llama.cpp without cloud APIs.

Technical Details

  • Reasoning Path: Implemented in llm.py, including prompt construction (prompts.py), parsing thinking processes and JSON move selection, adjusting sampling temperature according to personality, and grammatical constraints (Oath mechanism).
  • Belief Meter: Implemented via token-level entropy analysis; higher entropy values reflect AI uncertainty (UI prompt); when the player "reads" the AI, its sampling temperature increases (simulating shaken composure).
  • Custom Frontend: Uses Gradio6's gradio.Server, presents a Himalayan-style interface via static/index.html, separating logic and presentation layers.
4

Section 04

In-Depth Analysis of Game Mechanics

Six-Move System

Move Cost Win-Loss Relationship
Vajra Strike Free Beats River Throw · Blocked by Mountain Stance
Mountain Stance (Block) Free, +1 prāṇa Blocks Vajra Strike, mitigates Prāṇa Art · Broken by River Throw
River Throw Free Breaks Mountain Stance · Loses to Vajra Strike
Draw Breath Free, +2 prāṇa Gathers prāṇa but fully exposed
Prāṇa Art 3 prāṇa Powerful long-range attack · Countered by Mist-Step
Mist-Step 2 prāṇa Dodges and counters attacks · Ineffective against cautious moves

Resource System Prāṇa (life energy) is the core resource: accumulated via Draw Breath and Mountain Stance, used to unleash powerful moves. The rhythm game is obvious: frequent Draw Breath exposes vulnerabilities but accumulates resources; continuous pressure prevents opponents from accumulating but may lead to being countered.

Ten Personality Opponents The AI has ten distinct personalities, each with unique temperament, strategy, and thinking budget. The same model exhibits completely different styles (aggressive/conservative, rational/intuitive), enhancing replay value.

5

Section 05

Model Fine-Tuning and Training

SFT Phase Trained using a dataset generated by self-play, allowing the model to learn to predict opponent behavior based on historical records under specific personalities. The dataset includes code-switching content in English, Hindi, and Sanskrit (IAST transliteration), enabling the model to narrate its thinking process in a philosophically rich language.

GRPO Training Further fine-tuned via GRPO (Group Relative Policy Optimization) to optimize decision quality in adversarial environments, making it more adaptable to dynamic game scenarios than SFT.

6

Section 06

Deployment Methods and Limitations & Insights

Deployment Modes

  • Simulated Opponent Mode: No need to download the model; uses personality-based heuristic algorithms to simulate AI, suitable for quick testing.
  • Local Model Mode: Configure the GGUF model path via environment variables and load the real model using llama.cpp.

Hardware Recommendations llama.cpp with ZeroGPU is unstable; it is recommended to run in CPU-only mode on a Space with upgraded CPU, or use a dedicated GPU Space. Turn-based delays (a few seconds of "loading") add dramatic tension.

Limitations & Insights

  • Advantages: Complex reasoning can run on consumer-grade hardware; access to internal model states (logits/entropy); fine-grained control over behavior; data privacy guaranteed.
  • Limitations: Model capacity limits complex strategy learning; reasoning speed is constrained by local hardware; multilingual training increases complexity.
  • Insights: Well-fine-tuned small models can exhibit surprising capabilities in specific well-defined tasks, balancing accessibility and controllability, providing references for edge AI and privacy scenarios.
7

Section 07

Project Summary and Value

Project Summary Mind of Tashi skillfully integrates game mechanics and AI capabilities; it is not just a technical demo but a complete game experience, demonstrating the potential of small reasoning models in interactive applications. The project builds a complete ecosystem: self-play dataset → model training (SFT/GRPO) → deployment and operation (simulated/local), providing a reusable model for AI-driven applications.

Target Awards Targeting Hackathon awards: Off the Grid (no cloud API), Llama Champion (runs on llama.cpp), Off-Brand (custom Gradio6 frontend), Well-Tuned (fine-tuned MoE GGUF model).

Value Insights Provides rich inspiration and practical experience for developers focusing on edge AI, small model fine-tuning, and AI innovation in games and interactive applications, proving the unique value of small models in specific scenarios.