Making African Languages Visible: A Python-Based Guide to Low-Resource Language
Speaker
Gift Ojeabulu
I’ve spent the last 6+ years at the intersection of AI/ML, SWE, developer advocacy, and community building. Most recently, I worked as an AI devrel advocate and content lead at Iterative.ai, the team behind the popular open source AI tools DVC and CML. I’ve built and scaled thriving AI communities, notably as co-founder of D.C.A, now the largest Data and AI community of Black professionals worldwide.
A visionary data scientist whose work is transforming Africa's technological landscape. As the Co-founder of Data Community Africa, an advisory board member at DevNetwork (Artificial Intelligence), and AI Developer Advocate, Gift has emerged as a pivotal figure in democratizing data and AI across the continent.
My crowning achievement, the African Data Community Newsletter, has become a beacon of knowledge sharing, reaching an impressive network of over 2500 subscribers spanning 45 countries and 8 U.S. states. This initiative has inspired his involvement at DatafestAfrica with 4 Conferences and 5+ hackathons in less than 4 years, now one of the continent's premier data and AI conferences, bringing together practitioners, researchers, and enthusiasts from across the globe.
In Lagos, Gift's leadership of the MLOps community has revolutionized how organizations approach machine learning operations. Under his guidance, the community has become a hub for innovation in practical MLOps and Large Language Models (LLMs), fostering collaboration between industry leaders and emerging talents. His emphasis on open-source AI development has created new pathways for African developers to contribute to global technological advancement.
Through strategic initiatives and unwavering dedication, Gift Ojeabulu continues to architect the future of Africa's data and AI ecosystem. His work exemplifies how individual leadership can catalyze continental transformation, making advanced technology accessible to communities that have historically been underserved in the global tech landscape.
Abstract
This talk introduces how Python and FastText can be used to detect low-resource African languages using the MasakhaNER dataset. We cover key preprocessing steps, evaluation methods, and challenges such as dialectal variation and sparse data. The session also compares FastText with African-focused NLP tools like AfroXLMR and Masakhane Models, offering clear guidance on when each tool works best.
Description
African languages remain heavily underrepresented in NLP, and building reliable language identification tools for them is still a major challenge. In this session, we explore how Python and FastText can be used to develop practical language detection systems for low-resource African languages, using insights drawn from the MasakhaNER dataset on Huggingface, one of the most comprehensive open-source African language corpora.
The talk begins with an overview of the unique characteristics of African languages that affect NLP performance, including dialect diversity, orthographic variation, code-switching, and limited labelled resources. We then outline a clear workflow for preparing multilingual datasets, selecting features, and evaluating language identification models, focusing on realistic constraints faced in low-resource environments.
A central part of the talk compares FastText with other African NLP tools such as AfroXLMR, Masakhane Models, and spaCy’s limited-language pipelines. This comparison highlights key differences in language coverage, model size, task flexibility, and production readiness. Attendees will gain practical guidance on when FastText is sufficient, when transformer-based models offer clear advantages, and how to navigate trade-offs around accuracy, speed, and resource usage.
This session emphasizes conceptual clarity, reproducible steps, and real-world lessons from applying these tools to African language datasets. The audience will leave with a strong understanding of the challenges and opportunities in low-resource language identification, along with actionable strategies for designing more inclusive NLP systems.