Section 01
[Introduction] WARDEN: Speech Recognition and Translation for Endangered Language Wardaman with 6 Hours of Data
Language diversity is an important part of human cultural heritage, but thousands of languages worldwide are facing the threat of extinction. Traditional speech recognition and translation technologies rely on large amounts of labeled data, which endangered languages precisely lack. The latest research proposes the WARDEN system, which uses a two-stage architecture (speech-to-phoneme + phoneme-to-English translation), combined with cross-language transfer and dictionary-enhanced large model reasoning. With only 6 hours of labeled audio data, it achieves high-quality transcription and translation for Wardaman, an endangered indigenous language in Australia, opening up new possibilities for low-resource language processing.