In publica commoda

Events


From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Task

Title of the event From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Task
Series CIDAS Colloquium
Organizer Campus-Institut Data Science (CIDAS)
Speaker Andreas Stephan
Speaker institution University of Vienna
Type of event Kolloquium
Category Forschung
Registration required Nein
Details Abstract: A main reason for the current success of large language models (LLMs) is
their ability to perform zero-shot reasoning, or in other words, the ability to solve
tasks without explicit training data. To reduce the need for human annotations, large
language models (LLMs) have been proposed as evaluators, or judges, of the quality
of other candidate LLMs. In this talk, we study LLM judges on mathematical
reasoning tasks. These tasks require multi-step reasoning, and the correctness of
solutions is verifiable, enabling an objective evaluation. In summary, this talk
presents 1) a detailed performance evaluation of LLM judges on mathematical
reasoning tasks, 2) an investigation of regularities, such as intriguing correlations, in
the judgement process and 3) the usage of textual features to analyze those.
Date Start: 24.10.2024, 14:15 Uhr
Ende: 24.10.2024 , 15:15 Uhr
Location Anderer Ort / Other Location
1.130
Goldschmidtstraße 1
Contact Isabelle Matthias
imatthi@gwdg.de