Veranstaltung

From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Task

Titel der Veranstaltung	From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Task
Reihe	CIDAS Colloquium
Veranstalter	Campus-Institut Data Science (CIDAS)
Referent/in	Andreas Stephan
Einrichtung Referent/in	University of Vienna
Veranstaltungsart	Kolloquium
Kategorie	Forschung
Anmeldung erforderlich	Nein
Beschreibung	Abstract: A main reason for the current success of large language models (LLMs) is their ability to perform zero-shot reasoning, or in other words, the ability to solve tasks without explicit training data. To reduce the need for human annotations, large language models (LLMs) have been proposed as evaluators, or judges, of the quality of other candidate LLMs. In this talk, we study LLM judges on mathematical reasoning tasks. These tasks require multi-step reasoning, and the correctness of solutions is verifiable, enabling an objective evaluation. In summary, this talk presents 1) a detailed performance evaluation of LLM judges on mathematical reasoning tasks, 2) an investigation of regularities, such as intriguing correlations, in the judgement process and 3) the usage of textual features to analyze those.
Zeit	Beginn: 24.10.2024, 14:15 Uhr Ende: 24.10.2024 , 15:15 Uhr
Ort	Anderer Ort / Other Location 1.130 Goldschmidtstraße 1
Kontakt	Isabelle Matthias imatthi@gwdg.de