FrugalGPT: Efficient GPT-2 Adaptation with LoRA, Quantization, and Synthetic Data

This post summarizes a Stanford CS224N project on making GPT-2 cheaper to adapt without giving up too much task performance. The work studies three levers: parameter efficiency, memory efficiency, and data efficiency.

Ryan D'Cunha Ethan Hersch Abhinav Chinta
Stanford University

Equal contribution

Why this project exists

Fine-tuning language models is usually expensive in exactly the places that matter for smaller teams: GPU memory, trainable parameters, and high-quality task data. This project uses GPT-2 as a controlled testbed and asks a practical question: how far can you push efficiency techniques before quality starts to degrade?

Setup and baseline

The team fine-tunes GPT-2 on sentiment classification, paraphrase detection, and Shakespearean sonnet generation. The first two tasks provide standard supervised baselines. Sonnet generation is the harder test because it requires longer-form, structured output and makes degradation easier to spot.

Task Method Metric Dev Test
SST sentiment Full fine-tuning Accuracy 0.513 0.546
CFIMDB sentiment Full fine-tuning Accuracy 0.971 -
Quora paraphrase Full fine-tuning Accuracy 0.911 0.891
Sonnet generation Full fine-tuning chrF 41.974 41.078

Three efficiency levers

1. LoRA for parameter efficiency

Instead of updating every model weight, LoRA injects low-rank adapters into the network and trains only those. The project varies rank, scaling factor, and module placement across attention and MLP blocks.

  • Best reported configuration: rank 256, alpha 16, learning rate 1e-2
  • Applying LoRA to attention and MLP layers outperforms attention-only tuning
  • Best dev chrF: 42.158, slightly above full fine-tuning

2. Quantization for memory efficiency

The project compares lower-precision inference and quantization-aware fine-tuning across FP16, BF16, INT8, and INT4 settings, measuring footprint, speed, and downstream quality.

  • BF16 and FP8-style lower precision preserve much of the generation quality
  • INT4 saves more memory but hurts performance without specialized training
  • QAFT recovers sonnet quality, but the paper notes signs of overfitting

3. Synthetic data for data efficiency

The original sonnet dataset is tiny. To expand it, the team prompts Gemini 2.5 Flash Lite, Flash, and Pro to generate up to 1,000 Shakespearean sonnets, then uses those synthetic samples for distillation-style fine-tuning.

  • Gemini 2.5 Flash produces the strongest gains for GPT-2
  • Best dev chrF rises to 46.605
  • Best test chrF reaches 52.838

What matters in the results

LoRA is the cleanest efficiency win

The strongest LoRA setup matches or slightly exceeds the full fine-tuning sonnet baseline while training a much smaller slice of the model. That makes it the most practical efficiency technique in the study.

Quantization is useful, but brittle during adaptation

Low-precision inference works well for footprint reduction. During fine-tuning, however, the results suggest a narrower path: you can preserve task performance, but general capability may erode if the model starts to over-specialize.

Synthetic data helps until the student hits a ceiling

Better teacher models do not automatically produce better student outcomes. The project argues that GPT-2 eventually runs into capacity limits, especially when asked to absorb signals from stronger frontier models like Gemini 2.5 Pro.

Poster snapshot

The poster is useful as a compact summary of the project framing, metrics, and conclusions. For a first pass, start there, then move to the paper for the full experimental detail.

Open the full poster PDF

PDF preview unavailable in this browser. Download the poster.

Open questions worth extending

Capacity

How much stronger would the synthetic-distillation gains be with GPT-2 Medium or Large?

Robustness

Do the quantized fine-tuned models retain broader zero-shot behavior across unrelated tasks?

Data quality

Would filtering synthetic sonnets by stricter rhyme and meter checks improve distillation efficiency?