Publications

[2] VeriBench: End-to-End Formal Verification Benchmark for AI Coding Agents in Lean 4

Brando Miranda, Srivatsava Daruru, Ethan S Hersch, Zhanke Zhou, Allen Nie, Daneshvar Amrollahi, Leni Aniva, Iddah Mlauzi, Kirill Acharya, Elyas Obbad, Dilara Soylu, Weston Kirk, Zixiao Jolene Wang, Kai Fronsdal, Ying Li, Donald Poindexter Jr, Rakshit Kaushik, Shurui Liu, Yegor Denisov-Blanch, Steven Dillmann, Simon Obstbaum, Santiago Cuellar, John Sarracino, Rylan Schaeffer, Mo Tiwari, Donghyun Lee, Bo Han, Sanmi Koyejo

Accepted at ICML 2026 Workshop on Deep Learning for Code (DL4C); ICML 2026 AI for Math Workshop (AI4Math)., 2026

Third-author paper on an end-to-end formal verification benchmark for AI coding agents in Lean 4.

Technical Blog Post

Ethan Hersch

Publications

Conference Papers

[1] Certifying the Judge: Falsifiable Properties for LLM-Based Evaluation of Formal Code

[2] VeriBench: End-to-End Formal Verification Benchmark for AI Coding Agents in Lean 4