正文

Call-Me-Maybe：小模型实现可靠函数调用的约束解码实践

一个展示大语言模型函数调用能力的开源项目，通过约束解码技术确保输出格式有效性，即使在0.5B参数的小模型上也能实现高可靠性的结构化输出。

函数调用约束解码大语言模型结构化输出JSON生成小模型工具调用API编排LLM应用

发布时间 2026/06/12 00:08最近活动 2026/06/12 00:21预计阅读 7 分钟

章节 01

Call-Me-Maybe: Core Insights on Reliable Function Calls for Small Models via Constrained Decoding

This post introduces the open-source project Call-Me-Maybe, which addresses the key challenge of reliable function calls in LLMs. By leveraging constrained decoding technology, it ensures strict compliance with predefined function signatures, enabling high-reliability structured outputs even on small models (e.g., 0.5B parameters). The project solves common issues like inconsistent formats, missing parameters, and type errors in function calls.

章节 02

Background: What is Function Calling & Its Key Challenges?

Function calling allows LLMs to convert natural language requests into structured function calls (e.g., get_weather(location="Beijing")). Typical scenarios include weather queries, schedule management, and data retrieval. However, challenges exist:

Format consistency: Ensuring valid JSON output.
Type safety: Correct parameter types (string, number, etc.).
Completeness: No missing required parameters.
Small model performance: Maintaining accuracy with limited parameters.

章节 03

Technical Principle: How Constrained Decoding Works

Constrained decoding restricts the model's output space during decoding. Its workflow:

Function signature definition: Developers define functions and their parameter schemas (e.g., get_weather with location and unit).
Syntax constraint building: Convert signatures into context-free grammar (CFG) or finite state machines (FSM).
Dynamic masking: At each decoding step, compute valid next tokens based on current prefix and rules.
Restricted sampling: Only sample from valid tokens to ensure compliance. Advantages: Zero format errors, type safety, complete parameters, and suitability for small models.

章节 04

Project Implementation: Architecture & Key Components

Call-Me-Maybe uses a modular design:

LLM SDK: Encapsulates model inference interfaces for multiple backends.
Constraint decoder: Implements FSM-based decoding constraints.
Function registry: Manages available function definitions.
Input processor: Parses natural language to extract intent. Key details:
FSM construction: For each function, build an FSM representing valid sequences (e.g., start → { → "name" → function name → ... → end).
Dynamic mask calculation: Mask illegal tokens in logits, then normalize for sampling.
Type validation: Check parameter types (string, number, boolean, enum) against schema.

章节 05

Performance: Small Model Advantages & Reliability Metrics

The project excels in small model scenarios:

Edge deployment on consumer hardware.
Faster inference (low latency).
Lower computational cost. Reliability metrics comparison:

Metric Unconstrained Constrained

JSON format correctness ~70% 100%

Parameter type correctness ~85% 100%

Required parameter completeness ~90% 100%

Overall availability ~60% >95%

Metric	Unconstrained	Constrained
JSON format correctness	~70%	100%
Parameter type correctness	~85%	100%
Required parameter completeness	~90%	100%
Overall availability	~60%	>95%

章节 06

Application Scenarios of Call-Me-Maybe

Key application areas:

Intelligent assistants: Reliably call external services (calendar, weather, email).
Automation workflows: Trigger business operations per rules, reducing manual intervention.
API orchestration: Plan and execute multi-API sequences correctly.

章节 07

Engineering Practice Suggestions

Practical tips for using the project: Function design:

Single responsibility: Each function does one thing, with reasonable parameters.
Clear naming: Intuitive function names.
Complete documentation: Describe parameters with examples.
Sensible defaults: For optional parameters. Error handling:
Handle unregistered functions, invalid parameter values, and execution failures. Performance optimization:
Batch processing for multiple requests.
Cache common request-response patterns.
Choose appropriate model size based on task complexity.