Fine-tuning takes a pre-trained base model and continues training it on a smaller, task-specific dataset. The result is a model that retains the broad knowledge of the base but is biased toward the patterns in the fine-tuning data. Common use cases: matching a brand voice, specializing in a domain (legal, medical), improving structured output formatting, and reducing prompt length by encoding behavior into weights.
Modern fine-tuning rarely updates all model weights (full fine-tuning), since that requires a massive GPU budget. Instead, parameter-efficient methods like LoRA (Low-Rank Adaptation) train a small adapter on top of frozen base weights. A LoRA adapter for a 7B model might be just 50MB and trainable on a single consumer GPU.
The pipeline: collect 100-10,000 high-quality examples in the target format, format them as input/output pairs, train for a few epochs, evaluate against a held-out set, deploy. Quality of examples matters far more than quantity. A handful of careful examples usually beats thousands of mediocre ones.
Fine-tuning lets you bake behavior into a model that would otherwise require long prompts every time. For high-volume use cases (customer support, document processing), fine-tuning a smaller model often beats prompting a larger one on cost and latency. For frontier capability, base-model prompting still wins.
RAG and fine-tuning are complementary, not competing. The Free APIs article covers when to reach for which.