Sign-to-Text: Putting the 'language' into Sign Language Recognition

Author: David Mansueto

Mansueto, David, 2019 Sign-to-Text: Putting the 'language' into Sign Language Recognition, Flinders University, College of Science and Engineering

Terms of Use: This electronic version is (or will be) made publicly available by Flinders University in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. You may use this material for uses permitted under the Copyright Act 1968. If you are the owner of any included third party copyright material and/or you believe that any material has been made available without permission of the copyright owner please contact copyright@flinders.edu.au with the details.

Abstract

The Sign-to-Text project explores the challenge of sign language recognition (SLR), in the context of a system to recognise Auslan (Australian Sign Language) and translate it into English text on a computer. This project was borne of a request from the Deaf Can:Do group. Workers converse with deaf community members in Auslan, their mutual frst language, then must enter case-notes in English – a second language with different vocabulary and grammar. SLR is ostensibly a well-studied feld nearly as old as the frst linguistic defnition of Auslan and has two known commercial solutions, neither of which meet the needs of Deaf Can:Do. A thorough grounding of Auslan linguistics allows defning the components that transform a series of gestures into a rich, expressive language. Auslan is a complex combination of visuo-temporo-spatial cues, including the well-accepted but poorly-clarifed phonemes – the irreducible, contrastive components of a lexeme – and the more elusive contextual elements, such as classifers, modifers, mime and the allocation of nouns to spatial locations for deictic reference. A new taxonomic linguistic structure for Auslan that includes all these elements is presented. The intrinsic challenges signed languages present for recognition and translation are defned, including several new challenges. Recognition begins by observing signing such as via instrumentation of the signer or optical image capture. As non-manual elements are essential for language, an optical input is currently required for true sign language recognition, however instrumentation typically provides far higher fidelity suited to the higher complexity of manual elements. A framework for sign language recognition and translation is proposed. A modular approach encourages multi-modal input. Taking cues from speech recognition, SLR is divided into a visual model that classifies individual phonemes and modifiers and a language model that considers the unfolding sentence as it combines these into lexemes. The integration modifiers, spatial referents and deixis locations are facilitated by additional classifiers and a new memory block within the visual model. The output lexemes including context and modification would be glossed by the language model, completing the recognition stage, leaving a final translation stage into the target language. Using consumer-grade depth cameras, the framework is implemented up to phonemic recognition (handshape), providing insight into the technical challenges faced by optical sign-language recognition systems. A verification of the system using 5 handshapes and classifying with a neural network achieved 87 % accuracy.

Keywords: automatic sign language recognition, stereo photogrammetry

Subject: Engineering thesis

Thesis type: Masters
Completed: 2019
School: College of Science and Engineering
Supervisor: David Hobbs