Zing Forum

Reading

Olaverse Legal: Open-Source Large Model Family for Legal Scenarios and Professional Training Methodology

Olaverse Legal is a series of open-source large language models for the legal domain, trained on legal case datasets using SFT and DPO based on the Mistral architecture, demonstrating professional-level capabilities in tasks such as contract analysis, evidence evaluation, and legal reasoning.

legal AIMistralfine-tuningSFTDPOcontract analysisopen sourceLLM
Published 2026-05-28 03:02Recent activity 2026-05-28 03:19Estimated read 8 min
Olaverse Legal: Open-Source Large Model Family for Legal Scenarios and Professional Training Methodology
1

Section 01

Olaverse Legal Open-Source Legal Large Model Family: Core Overview and Value

Olaverse Legal is a series of open-source large language models for the legal domain, trained on legal case datasets using SFT and DPO based on the Mistral architecture, demonstrating professional-level capabilities in tasks such as contract analysis, evidence evaluation, and legal reasoning.

The model family plans a complete matrix from lightweight to enterprise-grade: the released Peace-7B (7B scale, suitable for general legal tasks), the upcoming Nkem-34B (complex legal reasoning), and Moyin-72B (enterprise-level automation). The project uses the Apache License 2.0 open-source license, allowing commercial use, modification, and distribution.

2

Section 02

Background: Specialized Needs of Legal AI and Project Origin

The legal domain has special requirements for AI: it needs to understand complex legal texts, cite precedents, identify clause risks, and perform logical reasoning. General large language models often lack accuracy and verifiability in professional legal tasks.

The Olaverse Legal project emerged to build an open-source model family that truly understands legal language and assists lawyers in their work.

3

Section 03

Model Family Matrix: Complete Layout from Lightweight to Enterprise-Grade

Olaverse Legal adopts a unified version strategy, with the model matrix as follows:

Model Scale Version Status Applicable Scenarios
Peace 7B v1.0 Released General legal tasks, fast reasoning
Nkem 34B v1.0 Coming soon Complex legal reasoning, high-precision requirements
Moyin 72B v1.0 Coming soon Enterprise legal automation, extreme performance

The released Peace-7B is based on the Mistral-7B-v0.3 architecture, fine-tuned on legal datasets to balance size and professional capabilities.

4

Section 04

Training Methodology: Two-Stage Transformation of General Models via SFT+DPO

The Peace model uses a two-stage training strategy:

First Stage: Supervised Fine-Tuning (SFT)

Using the Cold Cases dataset from the Harvard Library Innovation Lab (4800 real cases, including case names, syllabi, judicial opinions, and judgment results), it learns legal text expression, argument structure, and judgment logic.

Second Stage: Direct Preference Optimization (DPO)

Based on LegalBench, 419 preference pairs were built covering 5 domains such as contract Q&A, hearsay evidence rules, and trademark classification to improve output professionalism and accuracy.

Training configuration: sequence length of 2048 tokens, 4-bit quantization, LoRA rank (16 for SFT stage, adaptive for DPO stage), trained on A100 GPU for approximately 17 minutes, with a final loss of 1.08.

5

Section 05

Performance Evaluation: Significant Improvements of Peace-7B on Legal Tasks

Performance improvements of Peace-7B compared to the base Mistral-7B:

Task Mistral-7B Baseline Peace-7B v1.0 Improvement
Contract Analysis 14.24s 9.60s 32.6% faster
Evidence Analysis 9.28s 9.57s Largely unchanged
Legal Reasoning 9.36s 9.55s Largely unchanged
Trademark Classification 9.40s 9.55s Largely unchanged
Case Analysis 9.37s 8.06s 14.0% faster
Average 10.33s 9.27s 10.3% faster

Output quality improvements: structured professional responses, accurate legal citations, clear reasoning processes, and consistent quality across tasks.

6

Section 06

Core Application Scenarios: Covering Multiple Professional Legal Tasks

Contract Analysis

Identify key obligations, risk points, and legal impacts of clauses (e.g., interpreting the meaning of Delaware arbitration clauses).

Legal Research

Answer legal questions, explain precedents, provide regulatory interpretations, and cite relevant legal concepts.

Document Review

Check compliance, missing clauses, and potential issues; suitable for due diligence and M&A reviews.

Case Outcome Prediction

Predict judgment results based on facts and precedents to assist in case strategy evaluation.

Evidence Evaluation

Judge the admissibility, relevance, and probative value of evidence to assist trial preparation.

7

Section 07

Ethical Use and Open-Source License: Clear Boundaries and Permissive Authorization

Limitations

  • May generate plausible but incorrect legal information
  • Not trained for specific jurisdictions
  • Cannot provide personalized legal advice
  • Only used as a research/analysis tool

Ethical Guidelines

  • Legal professionals verify outputs
  • Not used for automated legal decisions
  • Clearly disclose AI assistance
  • Human supervision for all legal applications

Open-Source License

Adopts Apache License 2.0, allowing commercial use, modification, and distribution. License and copyright notices must be retained, and no warranties are provided.