# COLLT: A Clarification-Driven Tool Learning Framework for Legal Large Language Models

> COLLT is a clarification-oriented tool learning framework designed specifically for Chinese online legal services. It addresses the issue of reduced answer quality caused by incomplete information in user legal consultations through an intelligent clarification mechanism combined with six professional legal tools.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-05-21T16:40:20.000Z
- 最近活动: 2026-05-21T16:49:02.301Z
- 热度: 159.8
- 关键词: 法律AI, 大语言模型, 工具学习, 澄清机制, 法律检索, COLLT, Lawformer, 中文法律
- 页面链接: https://www.zingnex.cn/en/forum/thread/collt-71a8e547
- Canonical: https://www.zingnex.cn/forum/thread/collt-71a8e547
- Markdown 来源: floors_fallback

---

## [Introduction] COLLT Framework: An Intelligent Tool Learning Solution to Address Information Gaps in Legal Consultations

COLLT is a clarification-oriented tool learning framework designed specifically for Chinese online legal services, focusing on solving the problem of reduced answer quality due to incomplete information in user legal consultations. The framework combines an intelligent clarification mechanism (using `<CLR>` for clarification and `<DRT>` for direct response) with six professional legal tools, enabling large models to identify information gaps and proactively clarify. It also supports mainstream Chinese large models like ChatGLM3-6B, and all datasets and code are open-sourced.

## Background: Information Gaps in Legal Consultations and the Birth of COLLT

In Chinese online legal service scenarios, users often ask questions in vague or incomplete ways (e.g., only asking 'What should I do if I'm beaten?' without key information like injury status or location), making it difficult for traditional large models to provide accurate, legally grounded answers. The COLLT framework is developed based on the characteristics of Chinese legal scenarios, aiming to enable large models to learn the intelligent judgment ability of 'asking clearly before answering'.

## Core Mechanism: Decision Logic for Intelligent Clarification and Direct Response

COLLT introduces two key action markers:
- `<CLR>` (Clarification): Proactively initiates a clarification dialogue when key information is missing
- `<DRT>` (Direct Response): Directly enters the tool retrieval and response process when information is sufficient
This decision mechanism is implemented through supervised fine-tuning, requiring the model to understand the differentiated information completeness requirements across different legal domains (criminal, civil, labor, etc.).

## Six Legal Tools Matrix: Covering Full-Spectrum Legal Information Retrieval

COLLT integrates six professional legal tools based on Lawformer:
1. **Legal Article Retrieval (T_LAS)**：Automatically retrieves applicable legal articles; training data comes from a subset of DISC-Law-SFT (excluding CAIL2018 to prevent data leakage)
2. **Legal Charge Prediction (T_LCP)**：Predicts criminal charges based on case details; integrates relevant data from DISC-Law-SFT
3. **Similar Case Retrieval (T_SCR)**：Retrieves similar cases from the case database; uses CAIL2019-SCM data
4. **Legal Element Recognition (T_LER)**：Extracts key legal elements (e.g., 'emotional breakdown' in divorce cases); based on the CAIL2019 element extraction dataset (62 labels)
5. **Legal Event Detection (T_LED)**：Identifies key legal events and their sequence; uses the LEVEN dataset
6. **Internet Search (T_NET)**：Calls the Bing API to obtain real-time legal dynamics
These tools form a complete legal knowledge retrieval and reasoning system.

## Budget Control: A Constraint Mechanism to Balance Answer Quality and Response Efficiency

COLLT designs a budget control mechanism: each dialogue round can trigger at most two tool calls (|τ| ≤ 2), based on three considerations:
- Latency control: Reduce response time and improve user experience
- Prevent excessive retrieval: Avoid the model falling into meaningless retrieval loops
- Focus on key information: Force the model to select the optimal tools within a limited budget
Tool call results are incorporated into the final answer via the `<ER>` marker, completing the retrieval-augmented generation chain.

## Multi-Model Adaptation and Training Process

The research team used 4-bit QLoRA technology (based on the unsloth framework) to adapt five mainstream Chinese large models: ChatGLM3-6B→COLLT-GLM, LLaMa-3-8B→COLLT-LLaMa, InternLM3-8B→COLLT-InternLM, Qwen2.5-7B→COLLT-Qwen, Baichuan2-7B→COLLT-Baichuan.
Training data construction is divided into three stages:
1. Extract 11,533 real legal consultation seed data from DISC-Law-SFT
2. Perform ambiguity annotation using the DeepSeek model to generate annot_ambig.jsonl
3. Annotate tool usage to build the collt_sft.jsonl training corpus

## Evaluation System and Data Open-Sourcing: Verifying the Framework's Effectiveness

COLLT builds a complete evaluation system:
- **AmbigLegalQA Evaluation Set**: 5,181 test samples covering 0-4 rounds of clarification dialogues, evaluating trigger accuracy (trigger-F1), coverage, and ROUGE-L metrics
- **LawBench Zero-Shot Evaluation**: Tests the model's comprehensive capabilities across 9 legal tasks (legal article prediction, charge prediction, sentence prediction, etc.)
The project open-sources all resources: training corpus collt_sft.jsonl (11,528 entries), evaluation benchmark ambiglegalqa.jsonl (5,181 entries), tool training data, and end-to-end code.

## Practical Significance and Future Outlook: The Implementation Paradigm of Legal AI

COLLT provides an important paradigm for the implementation of legal AI:
1. **Clarification Priority**: Intelligently balances clarification and direct response, avoiding 'speaking nonsense seriously'
2. **Tool Collaboration**: The six tools form a complementary knowledge retrieval network, covering full-spectrum legal information from legal articles to cases
3. **Budget Constraint**: Balances answer quality and response latency through limits on tool call frequency
The framework is applicable to scenarios such as online legal consultation, contract review, and compliance checks. In the future, it needs to address the balance challenge between 'answering as much as possible' and 'ensuring answer accuracy'.
