Events
From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Task
Title of the event | From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Task |
Series | CIDAS Colloquium |
Organizer | Campus-Institut Data Science (CIDAS) |
Speaker | Andreas Stephan |
Speaker institution | University of Vienna |
Type of event | Kolloquium |
Category | Forschung |
Registration required | Nein |
Details | Abstract: A main reason for the current success of large language models (LLMs) is their ability to perform zero-shot reasoning, or in other words, the ability to solve tasks without explicit training data. To reduce the need for human annotations, large language models (LLMs) have been proposed as evaluators, or judges, of the quality of other candidate LLMs. In this talk, we study LLM judges on mathematical reasoning tasks. These tasks require multi-step reasoning, and the correctness of solutions is verifiable, enabling an objective evaluation. In summary, this talk presents 1) a detailed performance evaluation of LLM judges on mathematical reasoning tasks, 2) an investigation of regularities, such as intriguing correlations, in the judgement process and 3) the usage of textual features to analyze those. |
Date | Start: 24.10.2024, 14:15 Uhr Ende: 24.10.2024 , 15:15 Uhr |
Location |
Anderer Ort / Other Location 1.130 Goldschmidtstraße 1 |
Contact |
Isabelle Matthias imatthi@gwdg.de |