# COLLT: A Clarification-Oriented Tool Learning Framework for Legal Large Language Models

> COLLT is a clarification-oriented tool learning framework specifically designed for the legal domain. It addresses the common information gap in user legal consultations and improves the response quality of large models in complex legal scenarios through six types of professional legal tools and an intelligent clarification mechanism.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-21T16:40:20.000Z
- 最近活动: 2026-05-21T16:47:52.257Z
- 热度: 150.9
- 关键词: 法律大模型, 工具学习, 澄清机制, Lawformer, QLoRA, 法律咨询, 多任务优化, 中文 NLP
- 页面链接: https://www.zingnex.cn/en/forum/thread/collt
- Canonical: https://www.zingnex.cn/forum/thread/collt
- Markdown 来源: floors_fallback

---

## Introduction: COLLT Framework—An Innovative Solution to the Information Gap Problem in Legal Large Models

COLLT (Clarification-Oriented Legal Language with Tool augmentation) is a clarification-oriented tool learning framework specifically designed for the legal domain. Its core goal is to address the common information gap in user legal consultations and improve the response quality of large models in complex legal scenarios. The framework features an intelligent clarification mechanism and six professional legal tools. It simulates real lawyers' workflow through a dual-track decision-making mechanism, uses budget control to prevent tool abuse, supports low-resource training and multi-model adaptation, and has verified its effectiveness through rigorous evaluations and open-sourced related resources.

## Background: Information Dilemma in Legal Consultations

On Chinese online legal service platforms, user consultations often suffer from severe information insufficiency (e.g., only asking "Can I get a divorce?" without specifying key information like marriage duration or property status). When faced with ambiguous queries, traditional large models either give vague answers or make inferences based on assumptions, leading to untargeted or even misleading suggestions. COLLT is proposed to address this pain point, empowering models with the ability to intelligently judge when to clarify and call professional tools to obtain authoritative evidence.

## Core Methods: Dual-Track Decision-Making Mechanism and Professional Tool System

The core innovation of COLLT is its action tagging system: <CLR> indicates the need to clarify information, and <DRT> indicates that it can directly enter the tool retrieval and response phase. The framework has six built-in professional tools trained on Lawformer: T_LAS (Legal Article Search), T_LCP (Crime Prediction), T_SCR (Similar Case Retrieval), T_LER (Element Recognition), T_LED (Event Detection), and T_NET (Web Search). Additionally, Proposition 1 restricts each dialogue round to a maximum of 2 tool calls to avoid excessive retrieval and achieve precise tool combinations (e.g., T_LER and T_LAS are prioritized for divorce property disputes).

## Low-Resource Training and Multi-Model Adaptation Details

COLLT uses 4-bit QLoRA technology to successfully fine-tune five mainstream Chinese large models such as ChatGLM3-6B and LLaMA-3-8B on a single NVIDIA RTX 4090 (24GB VRAM). The low-resource solution is easy for small and medium-sized teams to reproduce. The training data comes from 11,533 real consultation seeds from DISC-Law-SFT. After two-stage annotation by DeepSeek (judging clarification needs and tool calls), the COLLT-SFT dataset (in OpenAI message format) containing 11,528 multi-turn dialogues was generated.

## Comprehensive Evaluation: Verifying the Framework's Effectiveness

COLLT's evaluation covers nine regular LawBench tasks and the specially constructed AmbigLegalQA evaluation set (5,181 samples covering 0-4 rounds of clarification). Evaluation metrics include Trigger-F1 (accuracy of clarification judgment), Clarification Coverage (whether clarification questions cover key gaps), and Multi-turn ROUGE-L (response matching degree). Ablation experiments show that the clarification mechanism and tool system can only exert maximum effectiveness when working in synergy.

## Open-Source Value: Boosting the Development of the Legal NLP Community

The COLLT project open-sources training code, evaluation scripts, dataset construction processes, and 11,528 training samples (under CC BY-NC 4.0 license), filling the gap of scarce high-quality Chinese legal dialogue datasets. Its explicit protocol tagging system (e.g., <CLR>, <DRT>) provides a reference paradigm for the tool learning field, facilitating debugging, interpretability analysis, and subsequent research interventions.

## Practical Insights and Future Outlook

Insights from COLLT for legal AI developers: 1. Domain-specific tool design is crucial—general tools cannot meet the precision requirements of law; 2. The clarification mechanism improves user experience and trust; 3. The budget control strategy can be applied to other tool learning scenarios. COLLT's open-source resources lay a solid foundation for the further development of legal AI.
