Reading

Mind of Tashi: A Psychological Game Duel with Small-Scale Reasoning Models

Mind of Tashi is a competitive game based on the blind commitment mechanism, where players engage in psychological battles with a fine-tuned small Mixture of Experts (MoE) reasoning model (approximately 200 million active parameters). The project demonstrates how to use small local models to implement complex recursive reasoning adversarial interactions and run on edge devices via llama.cpp without relying on cloud APIs.

小型模型推理模型MoEllama.cpp游戏AI心理博弈模型微调GitHub

Published 2026-06-09 14:14Recent activity 2026-06-09 14:21Estimated read 11 min

Section 01

Introduction: Mind of Tashi - A Psychological Game Duel with Small-Scale Reasoning Models

Project Basic Information

Original Author/Maintainer: Mandark-droid
Source Platform: GitHub
Original Link: https://github.com/Mandark-droid/mind-of-tashi
Release Date: 2026-06-09

Core Points Mind of Tashi is a competitive game based on the blind commitment mechanism, where players engage in psychological battles with a fine-tuned small MoE reasoning model (approximately 200 million active parameters). The project runs on edge devices via llama.cpp without cloud APIs, demonstrating the possibility of small local models implementing complex recursive reasoning adversarial interactions. Set in a ninja monk village in the Himalayas, players need to climb a trial tower guarded by AI; the core lies in predicting the AI opponent, which narrates its own thinking process.

Section 02

Project Background and Core Mechanisms

Project Background This project is an entry for the second track "An Adventure in Thousand Token Wood" of the Build Small Hackathon. The game is set in a ninja monk village shrouded in mist in the Himalayas, where the player's goal is to climb a trial tower guarded by AI opponents.

Core Mechanisms The core of the game is the blind commitment duel: in each round, the player and AI secretly choose moves simultaneously, with no reaction time—relying solely on prediction. After the AI makes a move, it reveals its interpretation of the player's behavior (e.g., "You took two unpunished breaths—greed, so I attack"). The essence of the game lies in recursive thinking (e.g., "I think you will attack, so I use Mist-Step; I think you think this way, so I take a breath"), which is exactly the area where reasoning models excel.

Section 03

Model Architecture and Technical Implementation

Model Architecture The AI opponent uses a custom Mixture of Experts (MoE) model: total parameters are approximately 400 million, with only about 200 million active parameters per token. Trained via SFT (Supervised Fine-Tuning) and GRPO, it supports code-switching between English + Hindi/Sanskrit (IAST transliteration) styles, and is 10-100 times smaller than cutting-edge API models (in terms of active parameters). The model is distributed in Q4_K_M GGUF format and runs via llama.cpp without cloud APIs.

Technical Details

Reasoning Path: Implemented in llm.py, including prompt construction (prompts.py), parsing thinking processes and JSON move selection, adjusting sampling temperature according to personality, and grammatical constraints (Oath mechanism).
Belief Meter: Implemented via token-level entropy analysis; higher entropy values reflect AI uncertainty (UI prompt); when the player "reads" the AI, its sampling temperature increases (simulating shaken composure).
Custom Frontend: Uses Gradio6's gradio.Server, presents a Himalayan-style interface via static/index.html, separating logic and presentation layers.

Section 04

In-Depth Analysis of Game Mechanics

Six-Move System

Move	Cost	Win-Loss Relationship
Vajra Strike	Free	Beats River Throw · Blocked by Mountain Stance
Mountain Stance (Block)	Free, +1 prāṇa	Blocks Vajra Strike, mitigates Prāṇa Art · Broken by River Throw
River Throw	Free	Breaks Mountain Stance · Loses to Vajra Strike
Draw Breath	Free, +2 prāṇa	Gathers prāṇa but fully exposed
Prāṇa Art	3 prāṇa	Powerful long-range attack · Countered by Mist-Step
Mist-Step	2 prāṇa	Dodges and counters attacks · Ineffective against cautious moves

Resource System Prāṇa (life energy) is the core resource: accumulated via Draw Breath and Mountain Stance, used to unleash powerful moves. The rhythm game is obvious: frequent Draw Breath exposes vulnerabilities but accumulates resources; continuous pressure prevents opponents from accumulating but may lead to being countered.

Ten Personality Opponents The AI has ten distinct personalities, each with unique temperament, strategy, and thinking budget. The same model exhibits completely different styles (aggressive/conservative, rational/intuitive), enhancing replay value.

Section 05

Model Fine-Tuning and Training

SFT Phase Trained using a dataset generated by self-play, allowing the model to learn to predict opponent behavior based on historical records under specific personalities. The dataset includes code-switching content in English, Hindi, and Sanskrit (IAST transliteration), enabling the model to narrate its thinking process in a philosophically rich language.

GRPO Training Further fine-tuned via GRPO (Group Relative Policy Optimization) to optimize decision quality in adversarial environments, making it more adaptable to dynamic game scenarios than SFT.

Section 06

Deployment Methods and Limitations & Insights

Deployment Modes

Simulated Opponent Mode: No need to download the model; uses personality-based heuristic algorithms to simulate AI, suitable for quick testing.
Local Model Mode: Configure the GGUF model path via environment variables and load the real model using llama.cpp.

Hardware Recommendations llama.cpp with ZeroGPU is unstable; it is recommended to run in CPU-only mode on a Space with upgraded CPU, or use a dedicated GPU Space. Turn-based delays (a few seconds of "loading") add dramatic tension.

Limitations & Insights

Advantages: Complex reasoning can run on consumer-grade hardware; access to internal model states (logits/entropy); fine-grained control over behavior; data privacy guaranteed.
Limitations: Model capacity limits complex strategy learning; reasoning speed is constrained by local hardware; multilingual training increases complexity.
Insights: Well-fine-tuned small models can exhibit surprising capabilities in specific well-defined tasks, balancing accessibility and controllability, providing references for edge AI and privacy scenarios.

Section 07

Project Summary and Value

Project Summary Mind of Tashi skillfully integrates game mechanics and AI capabilities; it is not just a technical demo but a complete game experience, demonstrating the potential of small reasoning models in interactive applications. The project builds a complete ecosystem: self-play dataset → model training (SFT/GRPO) → deployment and operation (simulated/local), providing a reusable model for AI-driven applications.

Target Awards Targeting Hackathon awards: Off the Grid (no cloud API), Llama Champion (runs on llama.cpp), Off-Brand (custom Gradio6 frontend), Well-Tuned (fine-tuned MoE GGUF model).

Value Insights Provides rich inspiration and practical experience for developers focusing on edge AI, small model fine-tuning, and AI innovation in games and interactive applications, proving the unique value of small models in specific scenarios.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Building an AWS Generative AI Application from Scratch: EC2 + Bedrock Hands-On Tutorial

A complete cloud-native AI application development guide for beginners, building a simple generative AI chatbot using Amazon EC2, Apache, Python CGI, and Amazon Bedrock, covering architecture design, IAM permission configuration, security best practices, and cost optimization suggestions.

Recent activity 2026-06-02 19:49