[1] Certifying the Judge: Falsifiable Properties for LLM-Based Evaluation of Formal Code
Ethan S Hersch, Brando Miranda, Elyas Obbad, Srivatsava Daruru, Kirill Acharya, Zixiao Jolene Wang, Steven Dillmann, Yegor Denisov-Blanch, Sanmi Koyejo
Accepted at ICML 2026 Workshop on Deep Learning for Code (DL4C); ICML 2026 AI for Math Workshop (AI4Math), 2026
First-author paper on falsifiable properties for LLM-based evaluation of formal code.
