In publica commoda

Veranstaltung


From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Task

Titel der Veranstaltung From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Task
Reihe CIDAS Colloquium
Veranstalter Campus-Institut Data Science (CIDAS)
Referent/in Andreas Stephan
Einrichtung Referent/in University of Vienna
Veranstaltungsart Kolloquium
Kategorie Forschung
Anmeldung erforderlich Nein
Beschreibung Abstract: A main reason for the current success of large language models (LLMs) is
their ability to perform zero-shot reasoning, or in other words, the ability to solve
tasks without explicit training data. To reduce the need for human annotations, large
language models (LLMs) have been proposed as evaluators, or judges, of the quality
of other candidate LLMs. In this talk, we study LLM judges on mathematical
reasoning tasks. These tasks require multi-step reasoning, and the correctness of
solutions is verifiable, enabling an objective evaluation. In summary, this talk
presents 1) a detailed performance evaluation of LLM judges on mathematical
reasoning tasks, 2) an investigation of regularities, such as intriguing correlations, in
the judgement process and 3) the usage of textual features to analyze those.
Zeit Beginn: 24.10.2024, 14:15 Uhr
Ende: 24.10.2024 , 15:15 Uhr
Ort Anderer Ort / Other Location
1.130
Goldschmidtstraße 1
Kontakt Isabelle Matthias
imatthi@gwdg.de