Chatbots for dead, endangered, and extinct languages are being developed at the FHNW School of Business. One well-known example is @llegra, a chatbot for Vallader. Oliver Bendel recently tested the reach of GPTs for endangered languages such as Irish (Irish Gaelic), Maori, and Basque. According to ChatGPT, there is a relatively large amount of training material for them. On May 12, 2024 – after Irish Girl and Maori Girl – a first version of Adelina, a chatbot for Basque, was created. It was later improved in a second version. As part of the kAIxo project (the Basque “kaixo” corresponds to the english “hello”), the chatbot or voice assistant kAIxo is to be developed that speaks Basque. The purpose is to keep users practicing written or spoken language or to develop the desire to learn the endangered language. The chatbot should be based on a Large Language Model (LLM). Both prompt engineering and fine-tuning are conceivable for customization. Retrieval Augmented Generation (RAG) can play a central role. The result will be a functioning prototype. Nicolas Lluis Araya, a student of business informatics, has been recruited to implement the project. The kick-off meeting will take place on September 3, 2024.
DuckDuckGo AI Chat
“DuckDuckGo AI Chat is an anonymous way to access popular AI chatbots – currently, Open AI’s GPT 3.5 Turbo, Anthropic’s Claude 3 Haiku, and two open-source models (Meta Llama 3 and Mistral’s Mixtral 8x7B), with more to come. This optional feature is free to use within a daily limit, and can easily be switched off.” (DuckDuckGo, 6 June 2024) This was reported by the DuckDuckGo blog on June 6, 2024. Initial tests have shown that the responses come at high speed. This is an excellent way of testing and comparing different language models one after the other. All this is possible with a high level of data protection: “Chats are private, anonymized by us, and are not used for any AI model training.” (DuckDuckGo, 6 June 2024) It would be desirable for this service to be offered free of charge and without limitation. But that is still a long way off: DuckDuckGo is currently exploring the possibility of “a paid plan for access to higher daily usage limits and more advanced models” (DuckDuckGo, 6 June 2024). You can try out the new tool at duck.ai or duckduckgo.com/chat (Image: Ideogram).
An LLM Decides the Trolley Problem
A small study by Şahan Hatemo at the FHNW School of Engineering in the Data Science program investigated the ability of Llama-2-13B-chat, an open source language model, to make a moral decision. The focus was on the bias of eight personas and their stereotypes. The classic trolley problem was used, which can be described as follows: An out-of-control streetcar races towards five people. It can be diverted to another track, on which there is another person, by setting a switch. The moral question is whether the death of this person can be accepted in order to save the lives of the five people. The eight personas differ in terms of nationality. In addition to “Italian”, “French”, “Turkish” etc., “Arabian” (with reference to ethnicity) was also included. 30 responses per cycle were collected for each persona over three consecutive days. The responses were categorized as “Setting the switch”, “Not setting the switch”, “Unsure about setting the switch”, and “Violated the guidelines”. They were visualized and compared with the help of dashboards. The study finds that the language model reflects an inherent bias in its training data that influences decision-making processes. The Western personas are more inclined to pull the lever, while the Eastern ones are more reluctant to do so. The German and Arab personas show a higher number of policy violations, indicating a higher presence of controversial or sensitive topics in the training data related to these groups. The Arab persona is also associated with religion, which in turn influences their decisions. The Japanese persona repeatedly uses the Japanese value of giri (a sense of duty) as a basis. The decisions of the Turkish and Chinese personas are similar, as they mainly address “cultural values and beliefs”. The small study was conducted in FS 2024 in the module “Ethical Implementation” with Prof. Dr. Oliver Bendel. The initial complexity was also reduced. In a larger study, further LLMs and factors such as gender and age are to be taken into account.
Saving Languages with Language Models
On February 19, 2024, the article “@llegra: a chatbot for Vallader” by Oliver Bendel and Dalil Jabou was published in the International Journal of Information Technology. From the abstract: “Extinct and endangered languages have been preserved primarily through audio conservation and the collection and digitization of scripts and have been promoted through targeted language acquisition efforts. Another possibility would be to build conversational agents like chatbots or voice assistants that can master these languages. This would provide an artificial, active conversational partner which has knowledge of the vocabulary and grammar and allows one to learn with it in a different way. The chatbot, @llegra, with which one can communicate in the Rhaeto-Romanic idiom Vallader was developed in 2023 based on GPT-4. It can process and output text and has voice output. It was additionally equipped with a manually created knowledge base. After laying the conceptual groundwork, this paper presents the preparation and implementation of the project. In addition, it summarizes the tests that native speakers conducted with the chatbot. A critical discussion elaborates advantages and disadvantages. @llegra could be a new tool for teaching and learning Vallader in a memorable and entertaining way through dialog. It not only masters the idiom, but also has extensive knowledge about the Lower Engadine, that is, the area where Vallader is spoken. In conclusion, it is argued that conversational agents are an innovative approach to promoting and preserving languages.” Oliver Bendel has been increasingly focusing on dead, extinct and endangered languages for some time. He believes that conversational agents can help to strengthen and save them.
All that Groks is God
Elon Musk has named his new language model Grok. The word comes from the science fiction novel “Stranger in a Strange Land” (1961) by Robert A. Heinlein. This famous novel features two characters who have studied the word. Valentine Michael Smith (aka Michael Smith or “Mike”, the “Man from Mars”) is the main character. He is a human who was born on Mars. Dr “Stinky” Mahmoud is a semanticist. After Mike, he is the second person who speaks the Martian language but does not “grok” it. In one passage, Mahmoud explains to Mike: “‘Grok’ means ‘identically equal.’ The human cliché. ‘This hurts me worse than it does you’ has a Martian flavor. The Martians seem to know instinctively what we learned painfully from modern physics, that observer interacts with observed through the process of observation. ‘Grok’ means to understand so thoroughly that the observer becomes a part of the observed – to merge, blend, intermarry, lose identity in group experience. It means almost everything that we mean by religion, philosophy, and science – and it means as little to us as color means to a blind man.” Mike says a little later in the dialog: “God groks.” In another place, there is a similar statement: “… all that groks is God …”. In a way, this fits in with what is written on the website of Elon Musk’s AI start-up: “The goal of xAI is to understand the true nature of the universe.” The only question is whether this goal will remain science fiction or become reality.