Digital Life

Develop a Spanish Benchmark for LALMs Assessment

Intro

Understanding auditory information is essential for fostering natural human-machine interactions.

Large Audio Language Models (LALMs) have rapidly advanced, with models like LTU, SALMONN, GAMA, Audio Flamingo 2, Qwen2.5-Omni, Audio Resoner, Kimi-Audio, and Audio Flamingo 3 showing major progress in processing audio inputs.

Evaluating these models is key to understanding their performance and limitations.

Existing benchmarks such as MMAU, MMAR, MMSU, and MMAU-Pro focus mainly on English. For Spanish, a clear gap remains, highlighting the need for a dedicated benchmark.

Challenge Description

This challenge aims to create a Multiple-Choice Question Answering benchmark to assess LALMs’ speech capabilities in Spanish.

It will use real-world Spanish audio data and linguistic expertise to build questions and answer choices in Spanish.

The process includes:
- Defining target capabilities (comprehension, reasoning, context).
- Collecting diverse Spanish audio from various dialects and contexts.
- Creating balanced multiple-choice questions and plausible answers.
- Validating quality through human review.
- Evaluating selected models and analyzing results.

A scientific report analyzing the benchmark, plus the benchmark itself with the audio and corresponding questions and answers.

Who is challenging you?

Telefónica's Industrial Tutors accompany you in the development of the TFG/TFM, providing their real vision of the industry. They will share their knowledge and experience, offering you feedback so that you can develop a project with an innovative impact.

Fernando López Gavilánez

Product Exploration and Prototyping - Digital Home / Telefónica Innovación Digital

Develop a Spanish Benchmark for LALMs Assessment

Context

Intro

Challenge Description

Who is challenging you?

Fernando López Gavilánez