Saving Languages with Language Models

On February 19, 2024, the article “@llegra: a chatbot for Vallader” by Oliver Bendel and Dalil Jabou was published in the International Journal of Information Technology. From the abstract: “Extinct and endangered languages have been preserved primarily through audio conservation and the collection and digitization of scripts and have been promoted through targeted language acquisition efforts. Another possibility would be to build conversational agents like chatbots or voice assistants that can master these languages. This would provide an artificial, active conversational partner which has knowledge of the vocabulary and grammar and allows one to learn with it in a different way. The chatbot, @llegra, with which one can communicate in the Rhaeto-Romanic idiom Vallader was developed in 2023 based on GPT-4. It can process and output text and has voice output. It was additionally equipped with a manually created knowledge base. After laying the conceptual groundwork, this paper presents the preparation and implementation of the project. In addition, it summarizes the tests that native speakers conducted with the chatbot. A critical discussion elaborates advantages and disadvantages. @llegra could be a new tool for teaching and learning Vallader in a memorable and entertaining way through dialog. It not only masters the idiom, but also has extensive knowledge about the Lower Engadine, that is, the area where Vallader is spoken. In conclusion, it is argued that conversational agents are an innovative approach to promoting and preserving languages.” Oliver Bendel has been increasingly focusing on dead, extinct and endangered languages for some time. He believes that conversational agents can help to strengthen and save them.

@llegra, a Chatbot for Vallader

Conversational agents have been a research subject of Prof. Dr. Oliver Bendel for a quarter of a century. He dedicated his doctoral thesis at the University of St. Gallen to them. At the School of Business FHNW, he developed them with his changing teams from 2012 to 2022, primarily in the context of machine ethics and social robotics. The philosopher of technology now devotes himself increasingly to dead, extinct, and endangered languages. After @ve (2022), a chatbot for Latin based on GPT-3, another project started in March 2023. The chatbot @llegra is developed by Dalil Jabou for the Rhaeto-Romanic idiom Vallader, which occurs in the Lower Engadine between Martina in the northeast and Zernez in the southwest, as well as in Val Müstair. The user can type text and gets text output. In addition, @llegra speaks with the help of a text-to-speech system from the company SlowSoft, which supports the project. The GPT-3 speech model produced rather unsatisfactory results. The breakthrough then came with the use of GPT-4. The knowledge base was supplemented with the help of four children’s books on Vallader. The project will be completed in August 2023. The results will be published thereafter.