Reading

BFMD: The First Full-Court Dense Badminton Dataset—Enabling AI to Understand the Tactical Intent of Every Shot

The research team from Nagoya Institute of Technology released BFMD, the first full-court dense badminton dataset, which includes 19 complete matches, 20 hours of video, and detailed annotations for 16,751 shot events. They also proposed a multi-modal shot description generation framework based on VideoMAE.

体育视频理解羽毛球数据集多模态学习视频描述生成计算机视觉动作识别VideoMAE战术分析深度学习数据集构建

Published 2026-03-26 23:09Recent activity 2026-03-28 07:55Estimated read 6 min

BFMD: The First Full-Court Dense Badminton Dataset—Enabling AI to Understand the Tactical Intent of Every Shot

Section 01

BFMD Dataset & Multi-modal Framework: Enabling AI to Understand Badminton Tactics

The teams from Nagoya Institute of Technology and Nagoya University released BFMD, the first full-court dense badminton dataset, which includes 19 complete professional matches (12 singles, 7 doubles), 20.32 hours of video, and detailed annotations for 16,751 shot events. They also proposed a multi-modal shot description generation framework based on VideoMAE, aiming to enable AI to generate accurate and tactically insightful shot descriptions from videos, thus advancing the field of sports video understanding.

Section 02

Background: Limitations of Existing Badminton Datasets

Existing badminton datasets have two major limitations: 1. Temporal fragmentation: Only short clips are included, lacking match continuity, leading to missing context and limited tactical analysis; 2. Single modality: Most only provide RGB videos, lacking key information such as shuttlecock trajectory, player posture, and on-court positions. In contrast, tennis and table tennis already have more comprehensive datasets (e.g., 3DTennisDS, THETIS, OpenTTGames), so the badminton field urgently needs a fully structured dataset.

Section 03

BFMD Dataset: Scale & Annotation System

BFMD's data is sourced from 19 top-tier events on the official BWF YouTube channel. Statistical data: 19 matches (12 singles,7 doubles), 20.32 hours of duration,1687 rallies,16751 shots. A three-level annotation system is adopted:1. Match segments (rallies, replays, Hawk-Eye replays);2. Rally events (shots, shuttlecock landing, net touches);3. Dense rally annotations (shot type, shuttlecock trajectory, player bounding boxes, posture key points, shot descriptions). The annotation process is human-machine collaboration: GPT-4.1 assists in generating initial results, then three annotators with over 5 years of experience review and revise, followed by iterative feedback.

Section 04

Multi-modal Hit Description Framework

The framework is based on VideoMAE, with the core innovation being a semantic feedback mechanism, consisting of four components:1. VideoMAE visual encoder + Token refiner (enhances feature interaction);2. Multi-modal fusion module (encodes and fuses player positions, posture key points, and shuttlecock trajectory);3. Transformer description decoder (autoregressively generates text);4. Semantic feedback module (parallelly predicts semantic attributes and feeds back to enhance representation). The training objective is multi-task learning: description generation loss (cross-entropy) + semantic feedback loss (multi-label binary cross-entropy), with a weight coefficient of 0.1.

Section 05

Experimental Results: Validating Multi-modal Value

Comparative experiments show that this framework outperforms traditional visual description models (e.g., SoccerNet-Caption), pre-trained VLMs (e.g., Vid2Seq), and zero-shot large VLMs (e.g., Qwen2.5-VL). Ablation experiments indicate: both the Token refiner and semantic feedback module improve performance; among multi-modal inputs, trajectory brings the most significant improvement, and the combination of all modalities yields the best results. In qualitative analysis, the model successfully identifies smashes, net shots, etc., but easily confuses similar actions like lifts and net shots.

Section 06

Limitations & Future Directions

Current limitations:1. Only uses singles data, no doubles processing;2. Relies on manually annotated shot events;3. Fine-grained shot types are easily confused. Future directions:1. Expand to full-court video understanding;2. Optimize for real-time applications;3. Cross-sport transfer (tennis, table tennis);4. Develop interactive analysis tools.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15