正文

NewLuggageDataset：基于YOLOv12的实时遗弃行李检测系统

一个结合增强型YOLOv12模型与时空推理的实时遗弃行李检测框架，通过多尺度目标检测和可解释的时间-距离约束，为公共安全监控提供高可靠性解决方案。

YOLOv12目标检测遗弃行李检测多目标追踪公共安全计算机视觉时空推理实时监控边缘计算

发布时间 2026/05/09 17:05最近活动 2026/05/09 17:22预计阅读 7 分钟

章节 01

NewLuggageDataset: Real-Time Abandoned Luggage Detection with YOLOv12 & Spatiotemporal Reasoning

This project introduces a real-time abandoned luggage detection system combining enhanced YOLOv12 models and spatiotemporal reasoning. It addresses public safety challenges in crowded places (airports, stations, malls) by integrating multi-scale object detection and interpretable time-distance constraints, providing a reliable solution for security monitoring. Key features include dual YOLOv12 models for luggage detection and person tracking, a spatiotemporal reasoning module for abandonment judgment, and an open-source dataset with diverse scenarios.

章节 02

Background & Technical Challenges of Abandoned Luggage Detection

Abandoned luggage in crowded public areas poses safety risks. Traditional manual monitoring is inefficient and error-prone due to fatigue and attention gaps. Technical challenges include: 1) Small target detection (luggage often occupies small frame space, easily occluded or affected by lighting/background); 2) Temporal association (requires tracking luggage and its owner over time to judge abandonment); 3) Real-time response (millisecond-level alerts needed for public safety); 4) False alarm control (to avoid wasting security resources and 'boy who cried wolf' effect).

章节 03

Technical Architecture & Spatiotemporal Reasoning Mechanism

Dual YOLOv12 Models:

YOLOv12m: Specialized for luggage detection, balances accuracy and speed, optimized for small targets via improved Feature Pyramid Network (FPN).
YOLOv12x: Focuses on person detection/tracking, adapts to dense/occluded scenes, maintains target identity despite posture changes or partial occlusion.

Spatiotemporal Reasoning:

Distance-time constraints: Triggers alarm if luggage-person distance exceeds threshold (e.g., 2m) for set time (e.g.,30s), with dynamic threshold adjustment for different scenes.
Trajectory association: Uses Hungarian algorithm for frame-to-frame matching, Kalman filter for position prediction, and feature re-identification for long-term occlusion recovery.
Event state machine: Manages luggage states (normal/warning/abandoned/released) for predictable and interpretable behavior.

章节 04

Dataset Construction & System Implementation

Dataset: Covers diverse scenes (airports, stations, malls), multi-view angles, time spans (day/night), and crowd densities. Annotations include bounding boxes, instance segmentation, attributes (luggage type, person posture), and trajectory associations. Data augmentation includes geometric transforms (crop/rotate/scale), lighting changes, noise injection, and occlusion simulation.

Implementation:

Inference optimization: INT8 quantization, TensorRT acceleration, batch processing, multi-thread pipeline (video decoding/preprocessing/inference/postprocessing parallelization).
Edge deployment: Lightweight models for Jetson devices, model pruning/knowledge distillation, hardware acceleration (NPU/TPU).
Alarm & integration: Multi-level alerts, real-time visualization, ONVIF/RTSP support, Webhook for third-party system integration.

章节 05

Performance Evaluation & Application Scenarios

Performance:

Accuracy: mAP@0.5 >85% for luggage detection, recall >95% for obvious abandonment, false alarm <1 per camera/hour.
Real-time: <20ms per frame on NVIDIA T4 GPU, supports 50+ FPS, 16 1080p streams per server, end-to-end delay <2s.

Applications: Airport security areas, train platforms, shopping malls, large event venues (concerts/sports events) to detect abandoned luggage and enhance public safety.

章节 06

Limitations & Future Improvements

Limitations:

Extreme crowding: Lower accuracy in highly crowded/occluded scenes for luggage-person association.
Similar luggage: Identity retention challenges for外观相似 items.
Complex interactions: Need to optimize logic for multi-person luggage handling or temporary placement.

Future: Multimodal fusion (audio + vision), behavior pattern learning (reduce rule dependency), active interaction (voice reminders for passengers), cross-camera tracking (wider monitoring coverage).

章节 07

Open Source Contribution & Conclusion

Open Source: Provides annotated dataset, pre-trained YOLOv12 weights, full code implementation, and detailed deployment docs for research/development.

Conclusion: The system combines advanced object detection with interpretable spatiotemporal reasoning, offering strong technical support for public safety. Future development should balance technological progress with privacy protection and ethical norms to ensure tools serve public well-being.