Certifying the Judge: Falsifiable Properties for LLM-Based Evaluation of Formal Code

Published in ICML 2026 Workshop on Deep Learning for Code, 2026

First-author paper; under review at the ICML 2026 Workshop on Deep Learning for Code.

Recommended citation: Ethan S Hersch, Brando Miranda, Elyas Obbad, Srivatsava Daruru, Kirill Acharya, Zixiao Jolene Wang, Steven Dillmann, Yegor Denisov-Blanch, Sanmi Koyejo. "Certifying the Judge: Falsifiable Properties for LLM-Based Evaluation of Formal Code." Under review at the ICML 2026 Workshop on Deep Learning for Code.