# Mind of Tashi: A Psychological Game Duel with Small-Scale Reasoning Models

> Mind of Tashi is a competitive game based on the blind commitment mechanism, where players engage in psychological battles with a fine-tuned small Mixture of Experts (MoE) reasoning model (approximately 200 million active parameters). The project demonstrates how to use small local models to implement complex recursive reasoning adversarial interactions and run on edge devices via llama.cpp without relying on cloud APIs.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-09T06:14:28.000Z
- 最近活动: 2026-06-09T06:21:43.192Z
- 热度: 150.9
- 关键词: 小型模型, 推理模型, MoE, llama.cpp, 游戏AI, 心理博弈, 模型微调, GitHub
- 页面链接: https://www.zingnex.cn/en/forum/thread/mind-of-tashi
- Canonical: https://www.zingnex.cn/forum/thread/mind-of-tashi
- Markdown 来源: floors_fallback

---

## Introduction: Mind of Tashi - A Psychological Game Duel with Small-Scale Reasoning Models

**Project Basic Information**
- Original Author/Maintainer: Mandark-droid
- Source Platform: GitHub
- Original Link: https://github.com/Mandark-droid/mind-of-tashi
- Release Date: 2026-06-09

**Core Points**
Mind of Tashi is a competitive game based on the blind commitment mechanism, where players engage in psychological battles with a fine-tuned small MoE reasoning model (approximately 200 million active parameters). The project runs on edge devices via llama.cpp without cloud APIs, demonstrating the possibility of small local models implementing complex recursive reasoning adversarial interactions. Set in a ninja monk village in the Himalayas, players need to climb a trial tower guarded by AI; the core lies in predicting the AI opponent, which narrates its own thinking process.

## Project Background and Core Mechanisms

**Project Background**
This project is an entry for the second track "An Adventure in Thousand Token Wood" of the Build Small Hackathon. The game is set in a ninja monk village shrouded in mist in the Himalayas, where the player's goal is to climb a trial tower guarded by AI opponents.

**Core Mechanisms**
The core of the game is the blind commitment duel: in each round, the player and AI secretly choose moves simultaneously, with no reaction time—relying solely on prediction. After the AI makes a move, it reveals its interpretation of the player's behavior (e.g., "You took two unpunished breaths—greed, so I attack"). The essence of the game lies in recursive thinking (e.g., "I think you will attack, so I use Mist-Step; I think you think this way, so I take a breath"), which is exactly the area where reasoning models excel.

## Model Architecture and Technical Implementation

**Model Architecture**
The AI opponent uses a custom Mixture of Experts (MoE) model: total parameters are approximately 400 million, with only about 200 million active parameters per token. Trained via SFT (Supervised Fine-Tuning) and GRPO, it supports code-switching between English + Hindi/Sanskrit (IAST transliteration) styles, and is 10-100 times smaller than cutting-edge API models (in terms of active parameters). The model is distributed in Q4_K_M GGUF format and runs via llama.cpp without cloud APIs.

**Technical Details**
- **Reasoning Path**: Implemented in `llm.py`, including prompt construction (`prompts.py`), parsing thinking processes and JSON move selection, adjusting sampling temperature according to personality, and grammatical constraints (Oath mechanism).
- **Belief Meter**: Implemented via token-level entropy analysis; higher entropy values reflect AI uncertainty (UI prompt); when the player "reads" the AI, its sampling temperature increases (simulating shaken composure).
- **Custom Frontend**: Uses Gradio6's `gradio.Server`, presents a Himalayan-style interface via `static/index.html`, separating logic and presentation layers.

## In-Depth Analysis of Game Mechanics

**Six-Move System**
| Move | Cost | Win-Loss Relationship |
|---|---|---|
| Vajra Strike | Free | Beats River Throw · Blocked by Mountain Stance |
| Mountain Stance (Block) | Free, +1 prāṇa | Blocks Vajra Strike, mitigates Prāṇa Art · Broken by River Throw |
| River Throw | Free | Breaks Mountain Stance · Loses to Vajra Strike |
| Draw Breath | Free, +2 prāṇa | Gathers prāṇa but fully exposed |
| Prāṇa Art | 3 prāṇa | Powerful long-range attack · Countered by Mist-Step |
| Mist-Step | 2 prāṇa | Dodges and counters attacks · Ineffective against cautious moves |

**Resource System**
Prāṇa (life energy) is the core resource: accumulated via Draw Breath and Mountain Stance, used to unleash powerful moves. The rhythm game is obvious: frequent Draw Breath exposes vulnerabilities but accumulates resources; continuous pressure prevents opponents from accumulating but may lead to being countered.

**Ten Personality Opponents**
The AI has ten distinct personalities, each with unique temperament, strategy, and thinking budget. The same model exhibits completely different styles (aggressive/conservative, rational/intuitive), enhancing replay value.

## Model Fine-Tuning and Training

**SFT Phase**
Trained using a dataset generated by self-play, allowing the model to learn to predict opponent behavior based on historical records under specific personalities. The dataset includes code-switching content in English, Hindi, and Sanskrit (IAST transliteration), enabling the model to narrate its thinking process in a philosophically rich language.

**GRPO Training**
Further fine-tuned via GRPO (Group Relative Policy Optimization) to optimize decision quality in adversarial environments, making it more adaptable to dynamic game scenarios than SFT.

## Deployment Methods and Limitations & Insights

**Deployment Modes**
- **Simulated Opponent Mode**: No need to download the model; uses personality-based heuristic algorithms to simulate AI, suitable for quick testing.
- **Local Model Mode**: Configure the GGUF model path via environment variables and load the real model using llama.cpp.

**Hardware Recommendations**
llama.cpp with ZeroGPU is unstable; it is recommended to run in CPU-only mode on a Space with upgraded CPU, or use a dedicated GPU Space. Turn-based delays (a few seconds of "loading") add dramatic tension.

**Limitations & Insights**
- **Advantages**: Complex reasoning can run on consumer-grade hardware; access to internal model states (logits/entropy); fine-grained control over behavior; data privacy guaranteed.
- **Limitations**: Model capacity limits complex strategy learning; reasoning speed is constrained by local hardware; multilingual training increases complexity.
- **Insights**: Well-fine-tuned small models can exhibit surprising capabilities in specific well-defined tasks, balancing accessibility and controllability, providing references for edge AI and privacy scenarios.

## Project Summary and Value

**Project Summary**
Mind of Tashi skillfully integrates game mechanics and AI capabilities; it is not just a technical demo but a complete game experience, demonstrating the potential of small reasoning models in interactive applications. The project builds a complete ecosystem: self-play dataset → model training (SFT/GRPO) → deployment and operation (simulated/local), providing a reusable model for AI-driven applications.

**Target Awards**
Targeting Hackathon awards: Off the Grid (no cloud API), Llama Champion (runs on llama.cpp), Off-Brand (custom Gradio6 frontend), Well-Tuned (fine-tuned MoE GGUF model).

**Value Insights**
Provides rich inspiration and practical experience for developers focusing on edge AI, small model fine-tuning, and AI innovation in games and interactive applications, proving the unique value of small models in specific scenarios.