[1] Certifying the Judge: Falsifiable Properties for LLM-Based Evaluation of Formal Code
Ethan S Hersch, Brando Miranda, Elyas Obbad, Srivatsava Daruru, Kirill Acharya, Zixiao Jolene Wang, Steven Dillmann, Yegor Denisov-Blanch, Sanmi Koyejo
Under review at ICML 2026 Workshop on Deep Learning for Code, 2026
First-author paper on falsifiable properties for LLM-based evaluation of formal code.
Recommended citation: Ethan S Hersch, Brando Miranda, Elyas Obbad, Srivatsava Daruru, Kirill Acharya, Zixiao Jolene Wang, Steven Dillmann, Yegor Denisov-Blanch, Sanmi Koyejo. "Certifying the Judge: Falsifiable Properties for LLM-Based Evaluation of Formal Code." Under review at the ICML 2026 Workshop on Deep Learning for Code.
