Digital Life
Open Innovation Campus
Digital Life
Resources
The project will mainly use public tools and datasets.
Are you interested?
If you are a professor or university student and you are interested in participating in the TUTORING program, register your information so that we can start the program.
Challenge proposed for students with linguistic knowledge, given that the focus is on speech.
Basic technical knowledge of Large Language Models, prompting, and Python is also recommended.
Understanding auditory information is essential for fostering natural human-machine interactions.
Large Audio Language Models (LALMs) have rapidly advanced, with models like LTU, SALMONN, GAMA, Audio Flamingo 2, Qwen2.5-Omni, Audio Resoner, Kimi-Audio, and Audio Flamingo 3 showing major progress in processing audio inputs.
Evaluating these models is key to understanding their performance and limitations.
Existing benchmarks such as MMAU, MMAR, MMSU, and MMAU-Pro focus mainly on English. For Spanish, a clear gap remains, highlighting the need for a dedicated benchmark.
This challenge aims to create a Multiple-Choice Question Answering benchmark to assess LALMs’ speech capabilities in Spanish.
It will use real-world Spanish audio data and linguistic expertise to build questions and answer choices in Spanish.
The process includes:
- Defining target capabilities (comprehension, reasoning, context).
- Collecting diverse Spanish audio from various dialects and contexts.
- Creating balanced multiple-choice questions and plausible answers.
- Validating quality through human review.
- Evaluating selected models and analyzing results.
A scientific report analyzing the benchmark, plus the benchmark itself with the audio and corresponding questions and answers.