Zing Forum

Reading

PODS-AI: AI-Powered Programmatic Orca Detection System

This article introduces the PODS-AI project developed by Orcasound, an AI system that automatically detects orca sounds using deep learning technology. The project includes a complete training data preparation pipeline, multi-model support (FastAI, OrcaHello, PODS-AI), and an intelligent timestamp correction feature, providing an innovative technical solution for marine ecological monitoring and protection.

虎鲸检测海洋生态深度学习音频分类FastAIPyTorch声学监测野生动物保护
Published 2026-05-09 01:19Recent activity 2026-05-09 01:33Estimated read 7 min
PODS-AI: AI-Powered Programmatic Orca Detection System
1

Section 01

PODS-AI: AI-Powered Programmatic Orca Sound Detection System Overview

PODS-AI (Programmatic Orca Detection System using AI) is an AI system developed by Orcasound to automatically detect orca sounds using deep learning. It features a complete training data preparation pipeline, multi-model support (FastAI, OrcaHello, PODS-AI), and intelligent timestamp correction. This system provides an innovative technical solution for marine ecological monitoring and protection.

2

Section 02

Project Background and Core Objectives

Background

Traditional acoustic monitoring of orcas relies on manual listening and analysis, which is inefficient and hard to cover large areas. Orcasound, a citizen science project, collects real-time marine sound data via a hydrophone network in the Pacific Northwest, but faces a bottleneck in processing massive data.

Core Objectives

  • Automate orca sound detection using AI models
  • Support real-time audio stream analysis
  • Integrate multiple detection models for accuracy improvement
  • Correct detection timestamps via model inference
3

Section 03

System Architecture and Key Processing Pipeline

PODS-AI uses a modular pipeline with six main steps:

  1. Detection Data Management: Create CSV files with fields like Category (sound type), NodeName (hydrophone node), Timestamp, URI (audio resource), Description, Notes.
  2. Audio Processing: Split continuous audio into 3-second segments (default) for model inference.
  3. Training Sample Extraction:
    • For human-marked orca detections: Download 60s audio before the timestamp, run model to score segments, adjust timestamp to the highest-score segment.
    • Sample generation rules: Max 10 standard samples per category, 10 extra machine-detected resident samples, max 10 human-marked samples per negative category.
  4. Audio Download: Save training samples to output/wav and test samples to output/testing-wav.
  5. Spectrogram Generation: Convert WAV files to PNG spectrograms for model input.
  6. Model Training: Train the PODS-AI model on generated samples.
4

Section 04

Multi-Model Support and Performance Evaluation

Supported Models

  1. FastAI: ResNet-based binary classifier using FastAI framework; requires compatibility patch for Python 3.11+.
  2. OrcaHello: SRKW-optimized binary classifier (resident vs other) from Hugging Face Hub, no fastai_audio dependency.
  3. PODS-AI: Self-developed multi-class model supporting 7 categories (humpback, human, jingle, resident, transient, vessel, water).

Performance Comparison (71 test samples)

Model Number of Evaluations Correct Count Accuracy FP FP Rate FN FN Rate Avg Time (s)
fastai 71 32 45.1% 30 42.3% 9 12.7% 1.00
orcahello 71 14 19.7% 49 69.0% 8 11.3% 0.24
podsai 71 38 53.5% 20 28.2% 13 18.3% 0.58

PODS-AI has the highest accuracy, while OrcaHello is the fastest.

5

Section 05

Practical Applications and Ecological Significance

Applications

  1. Real-time Monitoring: 24/7 automatic detection, real-time alerts for researchers and enthusiasts.
  2. Historical Data Analysis: Identify orca calls in archives, analyze activity patterns and migration.
  3. Citizen Science: Volunteers can upload recordings, which are auto-segmented and predicted; confirmed samples are added to training sets.

Ecological Impact

  • Improves monitoring efficiency and reduces manual costs.
  • Supports multi-category sound detection for comprehensive marine soundscape monitoring.
  • Open-source to promote global collaboration.

Future Directions

  • Integrate Transformer/EfficientNet architectures.
  • Develop edge computing version for real-time embedded deployment.
  • Expand to other marine mammals (dolphins, seals).
  • Combine with satellite data for multi-modal monitoring.
6

Section 06

Technical Stack and Core Dependencies

PODS-AI is built on Python with core dependencies:

  • boto3: Access S3 audio files.
  • ffmpeg-python: Audio processing.
  • librosa>=0.10.0: Audio analysis.
  • m3u8: HLS stream parsing.
  • pytz: Timezone handling.
  • fastai>=1.0.61: FastAI model support.
  • torch>=2.1.0: Deep learning framework.
  • torchvision>=0.16.0: Computer vision tools.
  • torchaudio>=2.1.0: Audio processing.
  • soundfile: Audio I/O.
  • fastai_audio: FastAI audio extension.
  • pandas, pydub: Data processing.