Zing Forum

Reading

TenkiBench: An Open-Source Large Language Model Evaluation Benchmark for Norwegian Small and Medium Enterprises

The first large language model evaluation benchmark specifically targeting the actual business scenarios of Norwegian small and medium enterprises, covering 8 real task categories such as invoice parsing, contract analysis, tax calculation, legal citation, etc.

大语言模型基准测试挪威语中小企业发票解析合同分析税务计算LLM评测
Published 2026-05-06 20:14Recent activity 2026-05-06 20:28Estimated read 5 min
TenkiBench: An Open-Source Large Language Model Evaluation Benchmark for Norwegian Small and Medium Enterprises
1

Section 01

Introduction / Main Floor: TenkiBench: An Open-Source Large Language Model Evaluation Benchmark for Norwegian Small and Medium Enterprises

The first large language model evaluation benchmark specifically targeting the actual business scenarios of Norwegian small and medium enterprises, covering 8 real task categories such as invoice parsing, contract analysis, tax calculation, legal citation, etc.

2

Section 02

Project Background: Why Do We Need a Regional Business Benchmark?

Norway, as a Nordic country with a high welfare system, has a unique business environment and regulatory framework:

  • Language Specificity: Norway uses both Bokmål (written Norwegian) and Nynorsk (New Norwegian) as official languages, and translation between the two is a daily business need
  • Strict Tax System: Value-added tax (MVA) calculation and tax declaration have complex rules and exceptions
  • Comprehensive Labor Law: Employment contracts, dismissal procedures, sick leave pay, etc., have clear legal provisions
  • Transparent Business Registration System: The Brønnøysund Register Centre provides public business information query services

These characteristics mean that a model that performs well on general English benchmarks may not perform well in Norwegian local business scenarios. TenkiBench was created precisely to fill this evaluation gap.

3

Section 03

Evaluation Category Design: Covering Real Business Scenarios

TenkiBench carefully designed 8 evaluation categories, each corresponding to the actual business needs of Norwegian small and medium enterprises:

4

Section 04

1. Invoice Parsing (faktura)

Norwegian invoices contain specific formats and fields: total amount, value-added tax (MVA), KID (customer identification number), due date, invoicer information, etc. The model needs to accurately extract these structured data.

Evaluation Method: Numerical matching + Regular expression verification

5

Section 05

2. Contract Analysis (kontrakt)

Tests the model's ability to identify risk clauses in non-disclosure agreements (NDA), delivery agreements, and employment contracts. This requires the model to understand the nuances of legal texts.

Evaluation Method: LLM referee + Rating scale

6

Section 06

3. Tax Calculation (mva-skatt)

Covers Norwegian tax practices such as value-added tax calculation, deduction issues, and tax liability judgment. This is one of the most common financial problems for small and medium enterprises.

Evaluation Method: Numerical calculation + Regular expression verification

7

Section 07

4. Legal Citation (lov-referanse)

Tests the model's ability to correctly cite Norwegian laws and regulations (Lovdata database). Accurate legal citation is crucial for compliance consulting.

Evaluation Method: Regular expression + Structural verification

8

Section 08

5. Business Registration Query (brreg)

Based on public data from the Brønnøysund Register Centre, tests the model's ability to query business organization information, signature rights, and shareholder roles.

Evaluation Method: JSON Schema verification