Zing Forum

Reading

AISL: Using Artificial Intelligence to Bridge the World of Sound and Silence

AISL is an innovative open-source project that combines computer vision and speech recognition technologies to enable sign language video recognition and speech-to-sign language image conversion, providing a technical solution for communication between the hearing-impaired and hearing communities.

人工智能手语识别计算机视觉语音识别无障碍技术MediaPipeOpenCV机器学习多模态AISTM32
Published 2026-06-02 20:12Recent activity 2026-06-02 20:19Estimated read 6 min
AISL: Using Artificial Intelligence to Bridge the World of Sound and Silence
1

Section 01

AISL Project Introduction

AISL: Using Artificial Intelligence to Bridge the World of Sound and Silence

AISL is an open-source project maintained by teodorus12 (GitHub link: https://github.com/teodorus12/AISL, release date: June 2, 2026). It combines computer vision and speech recognition technologies to enable sign language video recognition and speech-to-sign language image conversion, aiming to build a two-way communication technical bridge between the hearing-impaired and hearing communities.

2

Section 02

Project Background and Social Significance

Project Background and Social Significance

Globally, communication barriers between the hearing-impaired and hearing communities have long existed. Traditional sign language translation relies on manual labor, which is costly and has limited coverage. The AISL project emerged as a solution: through AI technology, it enables machines to 'read' sign language and convert speech into sign language images. This is not only a technological innovation but also has profound social significance in promoting equal information transmission and eliminating communication barriers.

3

Section 03

Core Technical Architecture

Core Technical Architecture

AISL adopts a multi-modal AI technical approach, integrating three key areas:

  • Computer Vision: Uses MediaPipe and OpenCV to process video streams and recognize/analyze sign language movements;
  • Speech Processing: Uses Librosa for audio signal processing, combined with machine learning models to recognize 5 basic vocabulary words (kava, pivo, sok, vino, čaj);
  • Hardware Integration: Supports serial communication with STM32 microcontrollers, transmitting data via USB Micro/Mini cables.
4

Section 04

Function Implementation and Workflow

Function Implementation and Workflow

The project's main program covers the complete process:

  • Data Collection: Download raw data in BIN format, parse it into data packets, and convert to WAV audio;
  • Signal Visualization: Use Matplotlib to display audio waveforms, assisting in model debugging;
  • End-to-End Speech-to-Sign Language: Option 11 supports selecting test WAV files. After the model predicts the vocabulary, it plays the corresponding sign language videos in alphabetical order (e.g., "čaj" → Č → A → J).
5

Section 05

Technology Stack, Structure, and Application Scenarios

Technology Stack, Structure, and Application Scenarios

  • Technology Stack: Developed in Python, relying on NumPy, PySerial, Matplotlib, Librosa, OpenCV, MediaPipe, Tkinter/PIL, etc.;
  • Project Structure: Clearly divided into directories such as bin_folder (BIN logs), wav_out (WAV output), teaching_data (training audio), testing_data (test audio), signs_data (sign language videos), etc.;
  • Application Scenarios: Real-time sign language recognition, speech-to-sign language conversion, accessibility tools for public services/education/medical care, real-time audio input processing.
6

Section 06

Future Development Directions

Future Development Directions

Planned improvement directions for the project:

  • Expand the dataset to cover more common vocabulary and gestures;
  • Introduce advanced deep learning architectures to improve recognition accuracy;
  • Enhance the real-time feedback capability of the user interface;
  • Support sign language recognition for more languages.
7

Section 07

Social Value and Conclusion

Social Value and Conclusion

AISL demonstrates the potential of AI in the field of social welfare, embodying the concept of 'technology for good' and promoting social inclusion. For developers, it is an excellent resource to learn the complete process from hardware data collection to model inference. Although it is in the early stage, the technical route is clear and the application prospects are broad. We look forward to more developers joining in to jointly promote the development of accessible communication technology.