Events

From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Task

Title of the event	From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Task
Series	CIDAS Colloquium
Organizer	Campus-Institut Data Science (CIDAS)
Speaker	Andreas Stephan
Speaker institution	University of Vienna
Type of event	Kolloquium
Category	Forschung
Registration required	Nein
Details	Abstract: A main reason for the current success of large language models (LLMs) is their ability to perform zero-shot reasoning, or in other words, the ability to solve tasks without explicit training data. To reduce the need for human annotations, large language models (LLMs) have been proposed as evaluators, or judges, of the quality of other candidate LLMs. In this talk, we study LLM judges on mathematical reasoning tasks. These tasks require multi-step reasoning, and the correctness of solutions is verifiable, enabling an objective evaluation. In summary, this talk presents 1) a detailed performance evaluation of LLM judges on mathematical reasoning tasks, 2) an investigation of regularities, such as intriguing correlations, in the judgement process and 3) the usage of textual features to analyze those.
Date	Start: 24.10.2024, 14:15 Uhr Ende: 24.10.2024 , 15:15 Uhr
Location	Anderer Ort / Other Location 1.130 Goldschmidtstraße 1
Contact	Isabelle Matthias imatthi@gwdg.de