# modelscan/registry: Open Source Large Language Model Metadata Unified Registry

> A machine-readable open-source metadata registry for large language models, which uniformly collects model identity, author, modality, context constraints, capabilities, and lifecycle information. It supports coexistence of multi-source commercial pricing data and uses the CC BY 4.0 license.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-06-04T20:43:54.000Z
- 最近活动: 2026-06-04T20:48:34.986Z
- 热度: 163.9
- 关键词: 大语言模型, LLM, 元数据注册表, 模型目录, OpenAPI, 模型选型, 定价策略, 开源项目, GitHub, CC BY 4.0
- 页面链接: https://www.zingnex.cn/en/forum/thread/modelscan-registry-68db1da5
- Canonical: https://www.zingnex.cn/forum/thread/modelscan-registry-68db1da5
- Markdown 来源: floors_fallback

---

## Introduction / Main Floor: modelscan/registry: Open Source Large Language Model Metadata Unified Registry

A machine-readable open-source metadata registry for large language models, which uniformly collects model identity, author, modality, context constraints, capabilities, and lifecycle information. It supports coexistence of multi-source commercial pricing data and uses the CC BY 4.0 license.

## Original Author and Source

- **Original Author/Maintainer:** modelscan team
- **Source Platform:** GitHub
- **Original Title:** registry - Open registry of large-language-model metadata
- **Original Link:** <https://github.com/modelscan/registry>
- **Publication Date:** June 4, 2026
- **License Agreement:** Creative Commons Attribution 4.0 International (CC BY 4.0)

---

## Project Background and Problems

With the explosive growth of the large language model (LLM) ecosystem, developers and enterprises face an increasingly severe challenge: how to obtain complete, accurate, and up-to-date metadata information for all models in one place.

Different platforms (OpenAI, Anthropic, Alibaba Cloud Tongyi Qianwen, Volcano Engine Ark, OpenRouter, etc.) maintain their own model lists with inconsistent formats, varying fields, and different update frequencies. This forces developers to switch between multiple API documents and even write their own crawlers to integrate data.

To make matters more complex, the same model may have different naming conventions (e.g., `gpt-4-turbo` vs `openai/gpt-4-turbo`), different version snapshots, and different pricing strategies across platforms. This fragmentation not only increases development costs but also easily leads to configuration errors and cost estimation deviations.

The modelscan/registry project was created to address this pain point; it aims to build an open, unified, machine-readable metadata registry for large language models.

---

## Single Trusted Data Source

The core of the project is a single JSON file named `models.json`, hosted on GitHub and distributed via CDN. This file contains complete metadata for all included models, covering everything from basic identity information to complex commercial pricing strategies. As of June 2026, the registry has included over 1197 models, covering mainstream commercial and open-source models.

## Stable Identity Recognition

Each model in the registry has a standardized `id` that is stable across platforms. Even if the same model has different names across sources (e.g., version snapshots with date suffixes), the registry merges them under the same base ID and retains the original naming format in the `alias_id` field. This design ensures that the same model is not split into multiple records, making it easy for developers to manage and query uniformly.

## Dual Currency Pricing Support

Considering the actual situation of the global model market, the registry supports retaining pricing information in multiple currencies simultaneously. USD quotes from OpenRouter and LiteLLM can coexist with CNY quotes from Alibaba Cloud Tongyi Qianwen and Volcano Engine Ark in the same model record. Developers can choose the appropriate pricing source based on their needs without precision loss from exchange rate conversion.

## Separation of Facts and Quotes

The registry divides data into two layers: top-level fields store source-agnostic factual information (such as context length, maximum input/output token count, supported modalities, etc.), which is derived by merging data from multiple sources; commercial data (prices, endpoint paths, rate limits, etc.) is stored in the `offers` array, and each quote clearly indicates its source to ensure data traceability.

## Tiered and Conditional Pricing

In reality, model pricing is often not a simple single price but includes multiple tiers (e.g., tiered pricing based on input token volume) or conditions (e.g., variations in video resolution, audio duration, etc.). The `prices` field in the registry is an array where each element represents a pricing tier, which can include condition thresholds and variant labels to accurately reflect real-world commercial models.
