Zing Forum

Reading

COLLT: A Clarification-Driven Tool Learning Framework for Legal Large Language Models

COLLT is a clarification-oriented tool learning framework designed specifically for Chinese online legal services. It addresses the issue of reduced answer quality caused by incomplete information in user legal consultations through an intelligent clarification mechanism combined with six professional legal tools.

法律AI大语言模型工具学习澄清机制法律检索COLLTLawformer中文法律
Published 2026-05-22 00:40Recent activity 2026-05-22 00:49Estimated read 8 min
COLLT: A Clarification-Driven Tool Learning Framework for Legal Large Language Models
1

Section 01

[Introduction] COLLT Framework: An Intelligent Tool Learning Solution to Address Information Gaps in Legal Consultations

COLLT is a clarification-oriented tool learning framework designed specifically for Chinese online legal services, focusing on solving the problem of reduced answer quality due to incomplete information in user legal consultations. The framework combines an intelligent clarification mechanism (using <CLR> for clarification and <DRT> for direct response) with six professional legal tools, enabling large models to identify information gaps and proactively clarify. It also supports mainstream Chinese large models like ChatGLM3-6B, and all datasets and code are open-sourced.

2

Section 02

Background: Information Gaps in Legal Consultations and the Birth of COLLT

In Chinese online legal service scenarios, users often ask questions in vague or incomplete ways (e.g., only asking 'What should I do if I'm beaten?' without key information like injury status or location), making it difficult for traditional large models to provide accurate, legally grounded answers. The COLLT framework is developed based on the characteristics of Chinese legal scenarios, aiming to enable large models to learn the intelligent judgment ability of 'asking clearly before answering'.

3

Section 03

Core Mechanism: Decision Logic for Intelligent Clarification and Direct Response

COLLT introduces two key action markers:

  • <CLR> (Clarification): Proactively initiates a clarification dialogue when key information is missing
  • <DRT> (Direct Response): Directly enters the tool retrieval and response process when information is sufficient This decision mechanism is implemented through supervised fine-tuning, requiring the model to understand the differentiated information completeness requirements across different legal domains (criminal, civil, labor, etc.).
4

Section 04

Six Legal Tools Matrix: Covering Full-Spectrum Legal Information Retrieval

COLLT integrates six professional legal tools based on Lawformer:

  1. Legal Article Retrieval (T_LAS):Automatically retrieves applicable legal articles; training data comes from a subset of DISC-Law-SFT (excluding CAIL2018 to prevent data leakage)
  2. Legal Charge Prediction (T_LCP):Predicts criminal charges based on case details; integrates relevant data from DISC-Law-SFT
  3. Similar Case Retrieval (T_SCR):Retrieves similar cases from the case database; uses CAIL2019-SCM data
  4. Legal Element Recognition (T_LER):Extracts key legal elements (e.g., 'emotional breakdown' in divorce cases); based on the CAIL2019 element extraction dataset (62 labels)
  5. Legal Event Detection (T_LED):Identifies key legal events and their sequence; uses the LEVEN dataset
  6. Internet Search (T_NET):Calls the Bing API to obtain real-time legal dynamics These tools form a complete legal knowledge retrieval and reasoning system.
5

Section 05

Budget Control: A Constraint Mechanism to Balance Answer Quality and Response Efficiency

COLLT designs a budget control mechanism: each dialogue round can trigger at most two tool calls (|τ| ≤ 2), based on three considerations:

  • Latency control: Reduce response time and improve user experience
  • Prevent excessive retrieval: Avoid the model falling into meaningless retrieval loops
  • Focus on key information: Force the model to select the optimal tools within a limited budget Tool call results are incorporated into the final answer via the <ER> marker, completing the retrieval-augmented generation chain.
6

Section 06

Multi-Model Adaptation and Training Process

The research team used 4-bit QLoRA technology (based on the unsloth framework) to adapt five mainstream Chinese large models: ChatGLM3-6B→COLLT-GLM, LLaMa-3-8B→COLLT-LLaMa, InternLM3-8B→COLLT-InternLM, Qwen2.5-7B→COLLT-Qwen, Baichuan2-7B→COLLT-Baichuan. Training data construction is divided into three stages:

  1. Extract 11,533 real legal consultation seed data from DISC-Law-SFT
  2. Perform ambiguity annotation using the DeepSeek model to generate annot_ambig.jsonl
  3. Annotate tool usage to build the collt_sft.jsonl training corpus
7

Section 07

Evaluation System and Data Open-Sourcing: Verifying the Framework's Effectiveness

COLLT builds a complete evaluation system:

  • AmbigLegalQA Evaluation Set: 5,181 test samples covering 0-4 rounds of clarification dialogues, evaluating trigger accuracy (trigger-F1), coverage, and ROUGE-L metrics
  • LawBench Zero-Shot Evaluation: Tests the model's comprehensive capabilities across 9 legal tasks (legal article prediction, charge prediction, sentence prediction, etc.) The project open-sources all resources: training corpus collt_sft.jsonl (11,528 entries), evaluation benchmark ambiglegalqa.jsonl (5,181 entries), tool training data, and end-to-end code.
8

Section 08

Practical Significance and Future Outlook: The Implementation Paradigm of Legal AI

COLLT provides an important paradigm for the implementation of legal AI:

  1. Clarification Priority: Intelligently balances clarification and direct response, avoiding 'speaking nonsense seriously'
  2. Tool Collaboration: The six tools form a complementary knowledge retrieval network, covering full-spectrum legal information from legal articles to cases
  3. Budget Constraint: Balances answer quality and response latency through limits on tool call frequency The framework is applicable to scenarios such as online legal consultation, contract review, and compliance checks. In the future, it needs to address the balance challenge between 'answering as much as possible' and 'ensuring answer accuracy'.