# Olaverse Legal: Open-Source Large Model Family for Legal Scenarios and Professional Training Methodology

> Olaverse Legal is a series of open-source large language models for the legal domain, trained on legal case datasets using SFT and DPO based on the Mistral architecture, demonstrating professional-level capabilities in tasks such as contract analysis, evidence evaluation, and legal reasoning.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-27T19:02:07.000Z
- 最近活动: 2026-05-27T19:19:57.595Z
- 热度: 150.7
- 关键词: legal AI, Mistral, fine-tuning, SFT, DPO, contract analysis, open source, LLM
- 页面链接: https://www.zingnex.cn/en/forum/thread/olaverse-legal
- Canonical: https://www.zingnex.cn/forum/thread/olaverse-legal
- Markdown 来源: floors_fallback

---

## Olaverse Legal Open-Source Legal Large Model Family: Core Overview and Value

Olaverse Legal is a series of open-source large language models for the legal domain, trained on legal case datasets using SFT and DPO based on the Mistral architecture, demonstrating professional-level capabilities in tasks such as contract analysis, evidence evaluation, and legal reasoning.

The model family plans a complete matrix from lightweight to enterprise-grade: the released Peace-7B (7B scale, suitable for general legal tasks), the upcoming Nkem-34B (complex legal reasoning), and Moyin-72B (enterprise-level automation). The project uses the Apache License 2.0 open-source license, allowing commercial use, modification, and distribution.

## Background: Specialized Needs of Legal AI and Project Origin

The legal domain has special requirements for AI: it needs to understand complex legal texts, cite precedents, identify clause risks, and perform logical reasoning. General large language models often lack accuracy and verifiability in professional legal tasks.

The Olaverse Legal project emerged to build an open-source model family that truly understands legal language and assists lawyers in their work.

## Model Family Matrix: Complete Layout from Lightweight to Enterprise-Grade

Olaverse Legal adopts a unified version strategy, with the model matrix as follows:

| Model | Scale | Version | Status | Applicable Scenarios |
|------|------|------|------|----------|
| Peace | 7B | v1.0 | Released | General legal tasks, fast reasoning |
| Nkem | 34B | v1.0 | Coming soon | Complex legal reasoning, high-precision requirements |
| Moyin | 72B | v1.0 | Coming soon | Enterprise legal automation, extreme performance |

The released Peace-7B is based on the Mistral-7B-v0.3 architecture, fine-tuned on legal datasets to balance size and professional capabilities.

## Training Methodology: Two-Stage Transformation of General Models via SFT+DPO

The Peace model uses a two-stage training strategy:

### First Stage: Supervised Fine-Tuning (SFT)
Using the Cold Cases dataset from the Harvard Library Innovation Lab (4800 real cases, including case names, syllabi, judicial opinions, and judgment results), it learns legal text expression, argument structure, and judgment logic.

### Second Stage: Direct Preference Optimization (DPO)
Based on LegalBench, 419 preference pairs were built covering 5 domains such as contract Q&A, hearsay evidence rules, and trademark classification to improve output professionalism and accuracy.

Training configuration: sequence length of 2048 tokens, 4-bit quantization, LoRA rank (16 for SFT stage, adaptive for DPO stage), trained on A100 GPU for approximately 17 minutes, with a final loss of 1.08.

## Performance Evaluation: Significant Improvements of Peace-7B on Legal Tasks

Performance improvements of Peace-7B compared to the base Mistral-7B:

| Task | Mistral-7B Baseline | Peace-7B v1.0 | Improvement |
|------|-----------------|---------------|----------|
| Contract Analysis | 14.24s | 9.60s | **32.6% faster** |
| Evidence Analysis | 9.28s | 9.57s | Largely unchanged |
| Legal Reasoning | 9.36s | 9.55s | Largely unchanged |
| Trademark Classification | 9.40s | 9.55s | Largely unchanged |
| Case Analysis | 9.37s | 8.06s | **14.0% faster** |
| **Average** | **10.33s** | **9.27s** | **10.3% faster** |

Output quality improvements: structured professional responses, accurate legal citations, clear reasoning processes, and consistent quality across tasks.

## Core Application Scenarios: Covering Multiple Professional Legal Tasks

### Contract Analysis
Identify key obligations, risk points, and legal impacts of clauses (e.g., interpreting the meaning of Delaware arbitration clauses).

### Legal Research
Answer legal questions, explain precedents, provide regulatory interpretations, and cite relevant legal concepts.

### Document Review
Check compliance, missing clauses, and potential issues; suitable for due diligence and M&A reviews.

### Case Outcome Prediction
Predict judgment results based on facts and precedents to assist in case strategy evaluation.

### Evidence Evaluation
Judge the admissibility, relevance, and probative value of evidence to assist trial preparation.

## Ethical Use and Open-Source License: Clear Boundaries and Permissive Authorization

### Limitations
- May generate plausible but incorrect legal information
- Not trained for specific jurisdictions
- Cannot provide personalized legal advice
- Only used as a research/analysis tool

### Ethical Guidelines
- Legal professionals verify outputs
- Not used for automated legal decisions
- Clearly disclose AI assistance
- Human supervision for all legal applications

### Open-Source License
Adopts Apache License 2.0, allowing commercial use, modification, and distribution. License and copyright notices must be retained, and no warranties are provided.
