Reading

sglang-codex-patches: An Adaptation Solution to Make SGLang a Backend for OpenAI Responses API

The sglang-codex-patches project uses source-level patches to make the SGLang inference engine fully compatible with the OpenAI Responses API, supporting Codex CLI and inference models like Kimi K2.6 and DeepSeek-R1. This article analyzes its technical implementation and engineering value.

SGLangOpenAI API推理模型Codex CLI开源部署API兼容本地推理Kimi K2.6DeepSeek-R1

Published 2026-05-01 02:35Recent activity 2026-05-01 02:49Estimated read 7 min

sglang-codex-patches: An Adaptation Solution to Make SGLang a Backend for OpenAI Responses API

Section 01

Introduction to the sglang-codex-patches Project: Making SGLang Compatible with OpenAI Responses API Backend

Introduction to the sglang-codex-patches Project

This project uses source-level patches to make the high-performance open-source LLM inference engine SGLang (version 0.5.10.post1) fully compatible with the OpenAI Responses API, supporting tools like Codex CLI and inference models such as Kimi K2.6 and DeepSeek-R1. This article will analyze its technical implementation, engineering value, and deployment methods.

Section 02

Project Background: Addressing Compatibility Pain Points Between SGLang and OpenAI API

Project Background

The OpenAI API is an industry standard for AI application development, but the native interface of the open-source inference engine SGLang differs from the OpenAI Responses API, limiting its integration with tools like Codex CLI. Codex CLI relies on the OpenAI API by default, which is inconvenient for users of local/third-party inference services.

This project was created by developer tonylkc, providing source code patches for SGLang version 0.5.10.post1 to fix API compatibility issues and add support for inference models, enabling the deployment of OpenAI API-compatible services in local/private environments.

Section 03

Technical Implementation: Core Modifications of the Patch

Technical Implementation

API Protocol Adaptation: Adjust SGLang to support the JSON Schema request/response format of the OpenAI API (e.g., messages array, tools field, stream option).
Streaming Response Optimization: Modify SGLang's streaming output format and event types to adapt to OpenAI's SSE protocol.
Inference Model Support: Map the intermediate inference steps of models like Kimi K2.6 and DeepSeek-R1 to the reasoning_content field of the OpenAI API.
Edge Scenario Handling: Fix invalid parameters to return correct HTTP status codes, handle timeouts, and resolve multi-concurrency stability issues.

Section 04

Supported Models and Hardware Configuration Requirements

Supported Models and Hardware

Models: Optimized support for Kimi K2.6 (Moonshot AI inference enhancement model) and DeepSeek-R1 (DeepSeek mathematical logic inference model).
Hardware: Multi-GPU cards are required to run large models; SGLang retains GPU utilization optimization features.
Deployment Recommendations: The documentation provides practical guidance on quantized model selection, batch size adjustment, memory optimization, etc.

Section 05

Deployment and Usage Process: From Patch to Production Service

Deployment and Usage Steps

Obtain Source Code: Download the source code of SGLang version 0.5.10.post1.
Apply Patch: Use the patch/git apply command to apply the diff patches provided by the project.
Compile and Deploy: Recompile SGLang, configure ports and API key verification, and meet production environment operation and maintenance requirements (load balancing, monitoring, etc.).
Client Configuration: Codex CLI points to the local SGLang service via environment variables or configuration files to achieve transparent switching.

Section 06

Engineering Value and Ecological Significance: Bridging Open Source and OpenAI Ecosystems

Engineering Value and Ecological Significance

Standardized Interoperability: Bridges open-source inference engines and the OpenAI API ecosystem, improving collaboration efficiency of AI infrastructure.
Developer Choice: Balances the convenience of commercial APIs with the flexibility and cost advantages of open-source models.
Open Source Community Promotion: Lowers the threshold for users to try open-source technologies and promotes the dissemination of excellent projects.

Section 07

Limitations and Future Outlook: Sustainability and Adaptive Development

Limitations and Future Outlook

Limitations: The patch depends on a specific version of SGLang (0.5.10.post1), and support for advanced features (complex tool calls, conversation management) is incomplete; self-built services require hardware investment and operation and maintenance capabilities.
Future Outlook: The patch functions may be merged into the upstream of SGLang to promote the development of a general adaptation framework; users need to weigh the pros and cons of self-built services versus cloud APIs.