Welcome!

Hi, I’m Ethan! I’m a first-year M.S. student in Computer Science at Stanford. My research focuses on trustworthy AI, especially evaluation and oversight for language models in math and coding. I’m also interested in deep learning, reinforcement learning, and systems for scalable ML. I am passionate about understanding state-of-the-art AI systems and why they work.

I am currently a researcher at the Stanford AI Lab, advised by Prof. Sanmi Koyejo and affiliated with the STAIR group. I previously earned a B.A. in computer science and mathematics at Cornell University, where I was fortunate to be advised by David Bindel.

Current Work

My current work focuses on trustworthy evaluation and scalable systems for modern AI models. Some areas I am actively exploring include:

  • Trustworthy AI evaluation: Developing methods to certify the reliability and consistency of LLM-generated code.
  • Efficient deep learning systems: Building and training transformer-based models with an emphasis on performance, scalability, and reproducibility.
  • Reinforcement learning and decision-making: Studying policy optimization and information-theoretic methods for sequential decision problems.

Papers

[1] Certifying the Judge: Falsifiable Properties for LLM-Based Evaluation of Formal Code
Ethan S Hersch, Brando Miranda, Elyas Obbad, Srivatsava Daruru, Kirill Acharya, Zixiao Jolene Wang, Steven Dillmann, Yegor Denisov-Blanch, Sanmi Koyejo
Under review at the ICML 2026 Workshop on Deep Learning for Code.

[2] VeriBench: End-to-End Formal Verification Benchmark for AI Coding Agents in Lean 4
Brando Miranda, Srivatsava Daruru, Ethan S Hersch, Zhanke Zhou, Allen Nie, Daneshvar Amrollahi, Leni Aniva, Iddah Mlauzi, Kirill Acharya, Elyas Obbad, Dilara Soylu, Weston Kirk, Zixiao Jolene Wang, Kai Fronsdal, Ying Li, Donald Poindexter Jr, Rakshit Kaushik, Shurui Liu, Yegor Denisov-Blanch, Steven Dillmann, Simon Obstbaum, Santiago Cuellar, John Sarracino, Rylan Schaeffer, Mo Tiwari, Donghyun Lee, Bo Han, Sanmi Koyejo
Under review at NeurIPS 2026. Led accompanying technical blog post.

Blogs

Teaching