Certifying the Judge: Falsifiable Properties for LLM-Based Evaluation of Formal Code

Published in ICML 2026 Workshop on Deep Learning for Code (DL4C); ICML 2026 AI for Math Workshop (AI4Math), 2026

First-author paper; accepted to the ICML 2026 Workshop on Deep Learning for Code (DL4C) and the ICML 2026 AI for Math Workshop (AI4Math).