# TenkiBench: An Open-Source Large Language Model Evaluation Benchmark for Norwegian Small and Medium Enterprises

> The first large language model evaluation benchmark specifically targeting the actual business scenarios of Norwegian small and medium enterprises, covering 8 real task categories such as invoice parsing, contract analysis, tax calculation, legal citation, etc.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-06T12:14:23.000Z
- 最近活动: 2026-05-06T12:28:00.470Z
- 热度: 159.8
- 关键词: 大语言模型, 基准测试, 挪威语, 中小企业, 发票解析, 合同分析, 税务计算, LLM评测
- 页面链接: https://www.zingnex.cn/en/forum/thread/tenkibench-a093c999
- Canonical: https://www.zingnex.cn/forum/thread/tenkibench-a093c999
- Markdown 来源: floors_fallback

---

## Introduction / Main Floor: TenkiBench: An Open-Source Large Language Model Evaluation Benchmark for Norwegian Small and Medium Enterprises

The first large language model evaluation benchmark specifically targeting the actual business scenarios of Norwegian small and medium enterprises, covering 8 real task categories such as invoice parsing, contract analysis, tax calculation, legal citation, etc.

## Project Background: Why Do We Need a Regional Business Benchmark?

Norway, as a Nordic country with a high welfare system, has a unique business environment and regulatory framework:

- **Language Specificity**: Norway uses both Bokmål (written Norwegian) and Nynorsk (New Norwegian) as official languages, and translation between the two is a daily business need
- **Strict Tax System**: Value-added tax (MVA) calculation and tax declaration have complex rules and exceptions
- **Comprehensive Labor Law**: Employment contracts, dismissal procedures, sick leave pay, etc., have clear legal provisions
- **Transparent Business Registration System**: The Brønnøysund Register Centre provides public business information query services

These characteristics mean that a model that performs well on general English benchmarks may not perform well in Norwegian local business scenarios. TenkiBench was created precisely to fill this evaluation gap.

## Evaluation Category Design: Covering Real Business Scenarios

TenkiBench carefully designed 8 evaluation categories, each corresponding to the actual business needs of Norwegian small and medium enterprises:

## 1. Invoice Parsing (faktura)

Norwegian invoices contain specific formats and fields: total amount, value-added tax (MVA), KID (customer identification number), due date, invoicer information, etc. The model needs to accurately extract these structured data.

**Evaluation Method**: Numerical matching + Regular expression verification

## 2. Contract Analysis (kontrakt)

Tests the model's ability to identify risk clauses in non-disclosure agreements (NDA), delivery agreements, and employment contracts. This requires the model to understand the nuances of legal texts.

**Evaluation Method**: LLM referee + Rating scale

## 3. Tax Calculation (mva-skatt)

Covers Norwegian tax practices such as value-added tax calculation, deduction issues, and tax liability judgment. This is one of the most common financial problems for small and medium enterprises.

**Evaluation Method**: Numerical calculation + Regular expression verification

## 4. Legal Citation (lov-referanse)

Tests the model's ability to correctly cite Norwegian laws and regulations (Lovdata database). Accurate legal citation is crucial for compliance consulting.

**Evaluation Method**: Regular expression + Structural verification

## 5. Business Registration Query (brreg)

Based on public data from the Brønnøysund Register Centre, tests the model's ability to query business organization information, signature rights, and shareholder roles.

**Evaluation Method**: JSON Schema verification
