Community Building Activities for Kiswahili Automatic Speech Recognition
About This Video
In this tech-vision talk, Kathleen Siminyu provides an exploratory dataset analysis of the Kiswahili speech recognition dataset on the Mozilla Common Voice platform. This dataset has been built as part of Mozilla Common Voice’s Kiswahili work – an initiative to bring a vital language of East Africa online and to make voice technology accessible to Kiswahili speakers. Through an exploration of the dataset and a storytelling of the community-building activities undertaken to crowdsource the dataset, she will show how various preferences and dynamics of the communities have resulted in the characteristics of the existing dataset.
In This Video
Kiswahili Machine Learning Fellow, Mozilla
Kathleen Siminyu is an AI Researcher focused on Natural Language Processing(NLP) for African Languages. She works at Mozilla Foundation as a Machine Learning Fellow to support the development of a Kiswahili Speech Recognition dataset and to build transcription models for end-use cases in the agricultural and financial domains. In this role, Kathleen is keen to ensure the diversity of Kiswahili speakers, in terms of age, gender, accent, and language variant/dialect, for the dataset and models created.