Reading

audx-kmp: Cross-platform Real-time Audio Denoising and Voice Activity Detection Library Based on RNNoise

audx-kmp is a Kotlin Multiplatform audio denoising library that encapsulates a C core library based on the RNNoise algorithm. It supports five target platforms and provides a unified API for real-time audio denoising and voice activity detection.

音频降噪语音活动检测RNNoiseKotlin Multiplatform实时处理神经网络跨平台GRU

Published 2026-06-05 23:44Recent activity 2026-06-05 23:54Estimated read 10 min

Section 01

Introduction / Main Floor: audx-kmp: Cross-platform Real-time Audio Denoising and Voice Activity Detection Library Based on RNNoise

Section 02

Original Author and Source

Original Author/Maintainer: rizukirr
Source Platform: GitHub
Original Title: audx-kmp
Original Link: https://github.com/rizukirr/audx-kmp
Core Dependency: https://github.com/rizukirr/audx-realtime
Release Date: June 5, 2026

Section 03

Project Overview

audx-kmp is an audio processing library built on Kotlin Multiplatform (KMP) technology. It provides developers with a unified API to access high-performance real-time audio denoising and Voice Activity Detection (VAD) functions. The project encapsulates the underlying C core library audx-realtime, which implements a neural network denoising engine based on the RNNoise algorithm. The overall architecture is exquisitely designed, ensuring the execution efficiency of native code while achieving cross-platform code reuse through Kotlin Multiplatform.

The uniqueness of this project lies in that it is not just a simple function encapsulation, but a complete multi-platform bridging system. For different target platforms, audx-kmp adopts two bridging mechanisms: For platforms like Linux x64, Android Native (ARM64/X64), and Windows (MinGW x64), it uses cinterop to directly link static libraries; for the JVM platform, it dynamically loads native libraries at runtime via JNI. This design fully considers the characteristics of each platform, ensuring both performance and flexibility.

Section 04

Core Principles of the RNNoise Algorithm

The core denoising capability of audx-kmp comes from the RNNoise algorithm, a hybrid speech enhancement solution proposed by Jean-Marc Valin of the Xiph.org Foundation in 2018. Unlike traditional pure signal processing or pure deep learning methods, RNNoise cleverly combines the advantages of both.

Section 05

Neural Network Architecture

RNNoise uses a lightweight recurrent neural network architecture, specifically including:

Three GRU (Gated Recurrent Unit) layers: Each layer contains 256 hidden units and is responsible for modeling the temporal dependence of speech signals. Compared to LSTM, GRU has fewer parameters and faster computation speed, making it ideal for real-time applications.
Two convolutional layers: Used to extract frequency-domain features and capture local spectral patterns.
Model quantization: Weights are stored using 8-bit quantization, and the entire model is only about 85KB, making it extremely suitable for deployment on embedded and mobile devices.
Sparsification: The model weights have a sparsity of 30% to 50%, further reducing the computational load.

Section 06

Signal Processing Pipeline

The processing pipeline of audx-realtime is very clear, breaking down audio processing into several stages:

First, the input audio data is automatically resampled to 48kHz (if the original sampling rate is different). The SpeexDSP library is used here for high-quality sampling rate conversion. Then, the audio is split into frames of 480 samples (corresponding to 10 milliseconds at 48kHz), which is the basic unit of processing.

Next, the algorithm extracts 42 acoustic features from each frame. These features include spectral envelope, fundamental frequency estimation, spectral flatness, etc., which provide rich input information for the neural network. Based on these features, the neural network calculates gain values for 22 frequency bands and outputs the probability of voice activity detection.

Finally, these gains are applied to the original spectrum to achieve denoising. If needed, the processed audio is resampled back to the original sampling rate for output. The total delay of the entire processing pipeline is approximately 10 to 13 milliseconds, meeting the strict requirements of real-time communication.

Section 07

Cross-platform Architecture Design

The architecture design of audx-kmp reflects an in-depth understanding of cross-platform development. It supports five main target platforms, each with a targeted implementation strategy:

Target Platform	Bridging Mechanism	Native Library File	Binding Timing
linuxX64	cinterop	libaudx.a	At link time
androidNativeArm64	cinterop	libaudx.a	At link time
androidNativeX64	cinterop	libaudx.a	At link time
mingwX64	cinterop	libaudx.a	At link time
JVM	JNI	libaudx_jni.so/.dll/.dylib	At runtime

For Kotlin/Native targets (Linux, Android Native, Windows), the library uses the cinterop tool to bind static libraries at compile time. This method has no runtime overhead and the highest call efficiency. The static library includes all code from audx-realtime, RNNoise, and SpeexDSP, forming a self-contained binary file.

For the JVM target, the situation is more complex. The JVM needs to communicate with native code via JNI (Java Native Interface). audx-kmp adopts an intelligent library loading strategy: first, it tries to extract the precompiled native library from the JAR package to the user cache directory (~/.cache/audx-kmp/), which ensures that the library version matches the Kotlin wrapper code exactly. If no suitable native library is found, it falls back to the standard System.loadLibrary() method, allowing users to specify a custom path via java.library.path.

Section 08

Android Integration Details

The Android platform is an important application scenario for audx-kmp. Since Android apps usually run on the ART virtual machine, JNI is needed for bridging. The project provides a complete Android sample app (sample-android/) that demonstrates how to record audio from the microphone and display the VAD status in real time.

Android developers need to place the libaudx_jni.so file corresponding to the ABI into the app's jniLibs directory. The project supports three mainstream ABIs: arm64-v8a, armeabi-v7a, and x86_64. To simplify the build process, the project also provides a GitHub Actions workflow that can automatically build all supported ABIs and generate directly usable artifacts.

audx-kmp: Cross-platform Real-time Audio Denoising and Voice Activity Detection Library Based on RNNoise

Introduction / Main Floor: audx-kmp: Cross-platform Real-time Audio Denoising and Voice Activity Detection Library Based on RNNoise

Original Author and Source

Project Overview

Core Principles of the RNNoise Algorithm

Neural Network Architecture

Signal Processing Pipeline

Cross-platform Architecture Design

Android Integration Details

Continue Reading

SignalCut: An Intelligent Tool for Turning AI Search Visibility Gaps into Video Marketing Campaigns

Graph Neural Networks Revolutionize Global Weather Forecasting: From Graph Weather to Open-Source Practice of Multi-Model Fusion

ExoVision: AI-Driven Exoplanet Detection and Habitability Assessment Platform

Vertica Expert Skills: A One-Stop Guide to Enterprise Database Migration and Optimization