On March 18, 2024, the kick-off meeting for the project “The Animal Whisperer” took place at the FHNW School of Business. It was initiated by Prof. Dr. Oliver Bendel, who has been working on animal-computer interaction and animal-machine interaction for many years. Nick Zbinden, a student of business information systems, has been recruited to work on the project. As part of his final thesis, he will develop three GPT-4-based applications that can be used to analyze the body language and environment of cows, horses and dogs. The aim is to avert danger to humans and animals. For example, a hiker can receive a recommendation on their smartphone not to cross a pasture if a mother cow and her calves are present. All they have to do is call up the application and take a photo of the area. Nick Zbinden will evaluate literature and conduct several expert interviews to find out more about the situation of farm and domestic animals and their behavior. He will demonstrate the possibilities, but also the limitations of multimodal language models in this context. The results will be available in August 2024 (Image: DALL-E 3).
The Animal Whisperer
When humans come into contact with wildlife, farm animals, and pets, they sometimes run the risk of being injured or killed. They may be attacked by bears, wolves, cows, horses, or dogs. Experts can use an animal’s body language to determine whether or not danger is imminent. Context is also important, such as whether a mother cow is with her calves. The multimodality of large language models enables novel applications. For example, ChatGPT can evaluate images. This ability can be used to interpret the body language of animals, thus using and replacing expert knowledge. Prof. Dr. Oliver Bendel, who has been involved with animal-computer interaction and animal-machine interaction for many years, has initiated a project called “The Animal Whisperer” in this context. The goal is to create a prototype application based on GenAI that can be used to interpret the body language of an animal and avert danger for humans. GPT-4 or an open source language model should be used to create the prototype. It should be augmented with appropriate material, taking into account animals such as bears, wolves, cows, horses, and dogs. Approaches may include fine-tuning or rapid engineering. The project will begin in March 2024 and the results will be available in the summer of the same year (Image: DALL-E 3).
Saving Languages with Language Models
On February 19, 2024, the article “@llegra: a chatbot for Vallader” by Oliver Bendel and Dalil Jabou was published in the International Journal of Information Technology. From the abstract: “Extinct and endangered languages have been preserved primarily through audio conservation and the collection and digitization of scripts and have been promoted through targeted language acquisition efforts. Another possibility would be to build conversational agents like chatbots or voice assistants that can master these languages. This would provide an artificial, active conversational partner which has knowledge of the vocabulary and grammar and allows one to learn with it in a different way. The chatbot, @llegra, with which one can communicate in the Rhaeto-Romanic idiom Vallader was developed in 2023 based on GPT-4. It can process and output text and has voice output. It was additionally equipped with a manually created knowledge base. After laying the conceptual groundwork, this paper presents the preparation and implementation of the project. In addition, it summarizes the tests that native speakers conducted with the chatbot. A critical discussion elaborates advantages and disadvantages. @llegra could be a new tool for teaching and learning Vallader in a memorable and entertaining way through dialog. It not only masters the idiom, but also has extensive knowledge about the Lower Engadine, that is, the area where Vallader is spoken. In conclusion, it is argued that conversational agents are an innovative approach to promoting and preserving languages.” Oliver Bendel has been increasingly focusing on dead, extinct and endangered languages for some time. He believes that conversational agents can help to strengthen and save them.
GPT-4 as Multimodal Model
GPT-4 was launched by OpenAI on March 14, 2023. “GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks.” (Website OpenAI) On its website, the company explains the multimodal options in more detail: “GPT-4 can accept a prompt of text and images, which – parallel to the text-only setting – lets the user specify any vision or language task. Specifically, it generates text outputs (natural language, code, etc.) given inputs consisting of interspersed text and images.” (Website OpenAI) The example that OpenAI gives is impressive. An image with multiple panels was uploaded. The prompt is: “What is funny about this image? Describe it panel by panel”. This is exactly what GPT-4 does and then comes to the conclusion: “The humor in this image comes from the absurdity of plugging a large, outdated VGA connector into a small, modern smartphone charging port.” (Website OpenAI) The technical report is available via cdn.openai.com/papers/gpt-4.pdf.