Zing Forum

Reading

Call-Me-Maybe: A Practice of Constrained Decoding for Reliable Function Calling in Small Models

An open-source project demonstrating the function calling capability of large language models. It ensures the validity of output formats through constrained decoding technology, enabling highly reliable structured outputs even on small models with 0.5B parameters.

函数调用约束解码大语言模型结构化输出JSON生成小模型工具调用API编排LLM应用
Published 2026-06-12 00:08Recent activity 2026-06-12 00:21Estimated read 7 min
Call-Me-Maybe: A Practice of Constrained Decoding for Reliable Function Calling in Small Models
1

Section 01

Call-Me-Maybe: Core Insights on Reliable Function Calls for Small Models via Constrained Decoding

This post introduces the open-source project Call-Me-Maybe, which addresses the key challenge of reliable function calls in LLMs. By leveraging constrained decoding technology, it ensures strict compliance with predefined function signatures, enabling high-reliability structured outputs even on small models (e.g., 0.5B parameters). The project solves common issues like inconsistent formats, missing parameters, and type errors in function calls.

2

Section 02

Background: What is Function Calling & Its Key Challenges?

Function calling allows LLMs to convert natural language requests into structured function calls (e.g., get_weather(location="Beijing")). Typical scenarios include weather queries, schedule management, and data retrieval. However, challenges exist:

  1. Format consistency: Ensuring valid JSON output.
  2. Type safety: Correct parameter types (string, number, etc.).
  3. Completeness: No missing required parameters.
  4. Small model performance: Maintaining accuracy with limited parameters.
3

Section 03

Technical Principle: How Constrained Decoding Works

Constrained decoding restricts the model's output space during decoding. Its workflow:

  1. Function signature definition: Developers define functions and their parameter schemas (e.g., get_weather with location and unit).
  2. Syntax constraint building: Convert signatures into context-free grammar (CFG) or finite state machines (FSM).
  3. Dynamic masking: At each decoding step, compute valid next tokens based on current prefix and rules.
  4. Restricted sampling: Only sample from valid tokens to ensure compliance. Advantages: Zero format errors, type safety, complete parameters, and suitability for small models.
4

Section 04

Project Implementation: Architecture & Key Components

Call-Me-Maybe uses a modular design:

  • LLM SDK: Encapsulates model inference interfaces for multiple backends.
  • Constraint decoder: Implements FSM-based decoding constraints.
  • Function registry: Manages available function definitions.
  • Input processor: Parses natural language to extract intent. Key details:
  • FSM construction: For each function, build an FSM representing valid sequences (e.g., start → { → "name" → function name → ... → end).
  • Dynamic mask calculation: Mask illegal tokens in logits, then normalize for sampling.
  • Type validation: Check parameter types (string, number, boolean, enum) against schema.
5

Section 05

Performance: Small Model Advantages & Reliability Metrics

The project excels in small model scenarios:

  • Edge deployment on consumer hardware.
  • Faster inference (low latency).
  • Lower computational cost. Reliability metrics comparison:
    Metric Unconstrained Constrained
    JSON format correctness ~70% 100%
    Parameter type correctness ~85% 100%
    Required parameter completeness ~90% 100%
    Overall availability ~60% >95%
6

Section 06

Application Scenarios of Call-Me-Maybe

Key application areas:

  1. Intelligent assistants: Reliably call external services (calendar, weather, email).
  2. Automation workflows: Trigger business operations per rules, reducing manual intervention.
  3. API orchestration: Plan and execute multi-API sequences correctly.
7

Section 07

Engineering Practice Suggestions

Practical tips for using the project: Function design:

  • Single responsibility: Each function does one thing, with reasonable parameters.
  • Clear naming: Intuitive function names.
  • Complete documentation: Describe parameters with examples.
  • Sensible defaults: For optional parameters. Error handling:
  • Handle unregistered functions, invalid parameter values, and execution failures. Performance optimization:
  • Batch processing for multiple requests.
  • Cache common request-response patterns.
  • Choose appropriate model size based on task complexity.
8

Section 08

Limitations & Future Directions

Current limitations:

  • Limited number of functions per context window.
  • High FSM complexity for deeply nested parameters.
  • No guarantee of semantic correctness (only format). Future directions:
  • Support multi-turn function calls and result reference.
  • Allow runtime registration of new functions.
  • Enable streaming output for lower latency.