Zing Forum

Reading

Real-Time Azerbaijani Sign Language Recognition System: An Accessible AI Solution Combining MediaPipe and LSTM

Based on MediaPipe hand key point detection and LSTM neural network, this system achieves real-time recognition of 100 Azerbaijani Sign Language vocabulary words, using over 7248 training samples to build accessible communication technology.

手语识别阿塞拜疆手语MediaPipeLSTM深度学习计算机视觉无障碍技术实时识别序列建模人机交互
Published 2026-06-12 07:15Recent activity 2026-06-12 07:22Estimated read 5 min
Real-Time Azerbaijani Sign Language Recognition System: An Accessible AI Solution Combining MediaPipe and LSTM
1

Section 01

[Project Introduction] Real-Time Azerbaijani Sign Language Recognition System: An Accessible AI Solution Combining MediaPipe and LSTM

This project was developed by Kage-develop and released on GitHub on June 11, 2026 (link: https://github.com/Kage-develop/azerbaijani-sign-language). Its core is combining MediaPipe hand key point detection with LSTM neural network to achieve real-time recognition of 100 common Azerbaijani Sign Language vocabulary words, using over 7248 training samples to build accessible communication technology, aiming to break the language barrier between hearing-impaired people and hearing people.

2

Section 02

Project Background and Significance: A Technical Solution for Communication Barriers of Hearing-Impaired People

Sign language is the main communication method for hearing-impaired people, but the proportion of hearing people who master sign language is extremely low, leading to communication barriers. About 70 million hearing-impaired people worldwide use sign language. Azerbaijani Sign Language (AzSL) has a unique grammar and vocabulary system. Sign language recognition needs to process multi-modal information such as hand movements in 3D space, which is technically challenging. In recent years, the development of computer vision and deep learning has brought possibilities for sign language recognition, and camera-based systems have advantages such as non-invasiveness and low cost.

3

Section 03

Technical Architecture: End-to-End Recognition Solution with MediaPipe + LSTM

The project combines MediaPipe hand key point detection (recognizing 21 key points: wrist, finger joints, palm) with LSTM temporal modeling. MediaPipe is lightweight and can reach over 30 FPS on CPU; LSTM processes sequence data, learning short-term patterns, long-term dependencies, and temporal dynamics of gestures, with input as a sequence of key point coordinates.

4

Section 04

Dataset and Training: Construction and Preprocessing of Over 7248 Samples

The project uses over 7248 training samples covering 100 Azerbaijani Sign Language vocabulary words. Data collection strategies may include multi-signer collection, multi-angle shooting, standardized environment, and vocabulary balance. Preprocessing steps: frame extraction → key point detection → coordinate normalization → sequence alignment → data augmentation.

5

Section 05

Deployment and Application: Real-Time Inference Scenarios for Multi-Device Adaptation

The system supports real-time inference with ordinary cameras and can be deployed on: 1. Desktop applications (educational scenarios, interaction between hearing-impaired students and computers); 2. Mobile devices (potential adaptation to Android/iOS); 3. Embedded systems (edge devices such as Raspberry Pi, suitable for offline/privacy scenarios).

6

Section 06

Technical Limitations and Future Directions: Key Issues to Be Addressed

Current system limitations: Only supports isolated word recognition (natural sign language is continuous); vocabulary needs to be expanded (100 words as a base but more are needed); lacks multi-modal fusion (facial expressions, body posture, etc.); needs to improve robustness to Azerbaijani Sign Language dialects/individual differences. Future directions include addressing these issues.

7

Section 07

Summary and Outlook: The Future of AI-Powered Accessible Communication

This project is an important application of AI in the accessibility field, providing a low-cost and easy-to-deploy solution. In the future, with technological progress and dataset expansion, it is expected to achieve higher accuracy and practicality, with the goal of natural human-machine sign language interaction. For developers, the project's technology stack is mature (Python, TensorFlow/PyTorch, MediaPipe), suitable for rapid prototyping.