Zing Forum

Reading

BeDiscovER: A Discourse Understanding Benchmark for the Era of Reasoning Language Models

A comprehensive discourse understanding evaluation benchmark accepted by EACL 2026, covering five tasks: conversational discourse parsing, discourse marker understanding, discourse relation recognition, sentence ordering, and temporal reasoning. It is specifically designed to evaluate the discourse understanding capabilities of reasoning language models.

语篇理解基准测试EACL 2026推理语言模型对话语篇解析语篇关系识别时间推理句子排序
Published 2026-04-17 13:15Recent activity 2026-04-17 13:21Estimated read 7 min
BeDiscovER: A Discourse Understanding Benchmark for the Era of Reasoning Language Models
1

Section 01

BeDiscovER: Introduction to the Discourse Understanding Benchmark for the Era of Reasoning Language Models

BeDiscovER is a comprehensive discourse understanding evaluation benchmark accepted by EACL 2026, specifically designed to assess the discourse understanding capabilities of reasoning language models. It covers five core discourse tasks: conversational discourse parsing, discourse marker understanding, discourse relation recognition, sentence ordering, and temporal reasoning. Its aim is to systematically evaluate models' discourse-level capabilities and promote the development of discourse understanding in the NLP field.

2

Section 02

Discourse Understanding: Challenges and Needs in the NLP Field

In recent years, the natural language processing field has made great progress in sentence-level and word-level tasks, but discourse-level understanding remains an open challenge. Discourse understanding involves analyzing relationships between text units, identifying logical structures, and integrating cross-sentence information—it is the key to truly understanding language. With the rise of reasoning language models, how to systematically evaluate the discourse capabilities of such models has become an urgent problem for the academic community to solve.

3

Section 03

Design of BeDiscovER's Five Core Tasks

BeDiscovER covers five core discourse tasks:

  1. Conversational Discourse Parsing: Identify discourse structures in conversations (unit segmentation, relation recognition), integrating authoritative datasets such as STAC and Molweni;
  2. Discourse Marker Understanding: Test understanding of the semantic functions of markers like "however" ("然而") and "therefore" ("因此"), based on the Just and Otherwise datasets;
  3. Discourse Relation Recognition: Determine logical relationships (causal, contrastive, etc.) between discourse units, integrating data from the DISRPT 2025 shared task;
  4. Sentence Ordering: Restore the correct order of shuffled sentences, reflecting grasp of coherence, with data from multiple domains such as academic abstracts and stories;
  5. Temporal Reasoning: Understand temporal relationships between events (sequence, simultaneity, etc.), based on time-annotated datasets like TimeBank-Dense.
4

Section 04

Dataset Organization and Usage of BeDiscovER

BeDiscovER adopts a clear data organization method, with each task having an independent directory and documentation. The project provides a unified data loading script, supporting flexible selection of datasets and configuration of sampling ratios. Data formats for different tasks are adapted to their characteristics: conversational discourse parsing uses JSON format, sentence ordering uses JSONL format, and discourse relation recognition supports automatic expansion of the DISRPT test set, facilitating cross-task comparison experiments for researchers.

5

Section 05

Why Does BeDiscovER Focus on Reasoning Language Models?

The name BeDiscovER reveals its era background—the era of reasoning language models. Traditional models focus on surface pattern matching, while reasoning models exhibit stronger logical reasoning capabilities through chain-of-thought. However, discourse understanding requires modeling long-distance dependencies, identifying implicit relationships, and grasping global structures. BeDiscovER is precisely designed to test the performance of reasoning models in these higher-level discourse capabilities.

6

Section 06

Academic Value and Application Prospects of BeDiscovER

As a paper accepted by EACL 2026, BeDiscovER has important academic value: it provides a standardized evaluation platform, reveals connections between different dimensions of discourse understanding through multi-task design, and helps researchers analyze models' strengths and weaknesses. For the industry, it guides application scenarios such as dialogue systems, document understanding, and knowledge extraction—developers can select suitable models and training strategies through evaluation.

7

Section 07

Summary and Outlook of BeDiscovER

BeDiscovER represents an important attempt to develop discourse understanding evaluation toward comprehensiveness and multi-dimensionality. It reminds us that truly understanding language requires grasping the macro structure and logical relationships of text, not just lexical and syntactic knowledge. As reasoning language models continue to evolve, BeDiscovER will become an important force driving progress in the field of discourse understanding.