Welcome!

Hi, I’m Ethan! I’m a first-year M.S. student in Computer Science at Stanford. My research focuses on trustworthy AI, especially evaluation and oversight for language models in math and coding. I’m also interested in deep learning, reinforcement learning, and systems for scalable ML. I am passionate about understanding state-of-the-art AI systems and why they work.

I am currently a researcher at the Stanford AI Lab, advised by Prof. Sanmi Koyejo and affiliated with the STAIR group. I previously earned a B.A. in computer science and mathematics at Cornell University, where I was fortunate to be advised by Prof. David Bindel.

Current Work

My current work focuses on trustworthy evaluation and scalable systems for modern AI models. Some areas I am actively exploring include:

Trustworthy AI evaluation: Developing methods to certify the reliability and consistency of LLM-generated code.
Efficient deep learning systems: Building and training transformer-based models with an emphasis on performance, scalability, and reproducibility.
Reinforcement learning and decision-making: Studying policy optimization and information-theoretic methods for sequential decision problems.

Papers

[1] Certifying the Judge: Falsifiable Properties for LLM-Based Evaluation of Formal Code
Ethan S Hersch, Brando Miranda, Elyas Obbad, Srivatsava Daruru, Kirill Acharya, Zixiao Jolene Wang, Steven Dillmann, Yegor Denisov-Blanch, Sanmi Koyejo
Accepted to the ICML 2026 Workshop on Deep Learning for Code (DL4C) and the ICML 2026 AI for Math Workshop (AI4Math).

[2] VeriBench: End-to-End Formal Verification Benchmark for AI Coding Agents in Lean 4
Brando Miranda, Srivatsava Daruru, Ethan S Hersch, Zhanke Zhou, Allen Nie, Daneshvar Amrollahi, Leni Aniva, Iddah Mlauzi, Kirill Acharya, Elyas Obbad, Dilara Soylu, Weston Kirk, Zixiao Jolene Wang, Kai Fronsdal, Ying Li, Donald Poindexter Jr, Rakshit Kaushik, Shurui Liu, Yegor Denisov-Blanch, Steven Dillmann, Simon Obstbaum, Santiago Cuellar, John Sarracino, Rylan Schaeffer, Mo Tiwari, Donghyun Lee, Bo Han, Sanmi Koyejo
Accepted to the ICML 2026 Workshop on Deep Learning for Code (DL4C) and the ICML 2026 AI for Math Workshop (AI4Math). In review at NeurIPS 2026. In review at NeurIPS 2026. Led accompanying technical blog post. [Preprint]

Blogs / Projects

My favorite way to showcase my projects is through blogs! Check them out on the blogs page or on my GitHub

I like to recreate things and understand how they work. Check out these blog posts for CLIP and Post-traininig!
VeriBench technical blog post
FrugalGPT
I occasionally write notes on AI, machine learning, and systems on Medium.

Teaching

I enjoy teaching and mentoring students, and I was a teaching assistant for Machine Learning at Cornell University.
Neural Networks from Scratch videos: part 1, part 2, part 3.
Diffusion models notes
MLE and Theoretical Deep Learning Guide

Ethan Hersch

Current Work

Papers

Blogs / Projects

Teaching