Zing Forum

Reading

Toy: A Zero-Dependency Ruby Neural Network Framework Compiled to Native Code with CUDA/Metal Support

Toy is a Transformer language model framework written in Ruby. Compiled to native binaries via Spinel, it has zero external dependencies and supports CPU, CUDA, and Metal backends. It can run HuggingFace models and produces bitwise identical outputs to PyTorch.

Rubyneural networktransformermachine learningSpinelCUDAMetallocal LLMHuggingFacezero dependencies
Published 2026-06-13 01:45Recent activity 2026-06-13 01:52Estimated read 6 min
Toy: A Zero-Dependency Ruby Neural Network Framework Compiled to Native Code with CUDA/Metal Support
1

Section 01

Introduction: Toy — A Zero-Dependency Native Ruby Neural Network Framework

Toy is a Transformer language model framework written in Ruby. Compiled to native binaries via Spinel, it has zero external dependencies and supports CPU, CUDA, and Metal backends. It can run HuggingFace models and produces bitwise identical outputs to PyTorch. Its core design philosophy is "readable machine learning", with clean and intuitive code that balances functional completeness and understandability.

2

Section 02

Background and Design Motivation

In the deep learning field, Python almost monopolizes the framework ecosystem (e.g., PyTorch, TensorFlow), but the Ruby community has long lacked a native, readable, zero-dependency neural network implementation. The Toy project aims to fill this gap—it is not just a toy project for teaching, but a fully functional framework with end-to-end capabilities for training, inference, evaluation, and deployment. Its core goal is to make machine learning code readable and understandable to humans.

3

Section 03

Technical Architecture and Algorithm Cards

Toy uses a five-layer algorithm stack:

  1. Primitives: Low-level tensor operations (matrix multiplication, activation functions, etc.) with a shared interface across backends;
  2. Blocks: Combine primitives into standard components (self-attention, feed-forward networks, etc.);
  3. Architectures: Define complete models (mainstream architectures like GPT-2, Llama, Mistral);
  4. Engine: Compile Ruby to native code via Spinel (a compiler developed by Ruby's creator);
  5. Recipes: User-friendly APIs for end-to-end training/inference workflows. Additionally, Toy introduces the concept of algorithm cards, enabling bidirectional conversion between code and documentation (toy describe renders cards, and cards can be parsed back to Ruby code) to ensure documentation and code stay in sync.
4

Section 04

Feature Support and Compatibility

Toy supports 17 model checkpoints (F32/Q8_0 quantization), three tokenizer variants, and automatic RoPE scaling detection. Backend support matrix: CPU (gated reference baseline), CUDA/Metal (mirror validation, bitwise identical to CPU). The CLI toolchain includes 9 core commands:

  • Basic: toy install (build and validate CPU backend), toy new (create experiment/library projects);
  • Model management: toy list (discover GGUF models), toy fetch (download models), toy describe (display algorithm cards);
  • Inference/service: toy infer (local inference), toy serve (OpenAI-compatible API), toy eval (evaluate models);
  • Training: toy train (from scratch/hot start/LoRA fine-tuning). The training API is concise and powerful, with bitwise consistency with PyTorch, allowing seamless integration into the HuggingFace ecosystem.
5

Section 05

Use Cases and Community Acknowledgments

Toy use cases:

  • Teaching and research: Clear code helps understand Transformer internal mechanisms;
  • Rapid prototyping: Ruby's expressiveness + native performance to validate new architecture ideas;
  • Ruby ecosystem integration: Teams with existing Ruby stacks don't need to introduce Python dependencies;
  • Edge deployment: Zero dependency, native compilation, small binary size. Thanks to Ninoslav Milenović for transferring the "toy" gem name on RubyGems, reflecting the generosity of the Ruby community.
6

Section 06

Conclusion

The Toy project proves that machine learning frameworks don't have to sacrifice readability for performance. Through Spinel compilation, layered architecture design, and strict bit consistency validation, Toy maintains clean and elegant code while providing production-ready features. It is a friendly entry point for Ruby developers to explore deep learning, and an example of concise design for complex systems for all developers.