Tinker Tutorials: A Practical Guide to Fine-Tuning LLMs

⬅️ Back to Tutorials

Source: Tinker Documentation


Thinking Machines Lab published a set of tutorials for their Tinker SDK, and they are worth paying attention to. Not because fine-tuning LLMs is new — it is not — but because the tutorials are structured the way tutorials should be.

They are marimo notebooks. Interactive, runnable, incremental. Each one builds on the last without assuming you already know everything.

What Is Tinker

Tinker is a fine-tuning platform aimed at researchers and engineers who want to train their own models rather than prompt someone else’s. It supports supervised fine-tuning, reinforcement learning (GRPO, PPO, custom losses), preference optimization (DPO, RLHF), distillation, multi-agent RL, and deployment.

The tutorials cover all of this end to end.

The Path

Basics — four notebooks that get you running. Hello Tinker makes your first API call. First SFT walks through renderers, loss computation, and optimizer steps. Async Patterns teaches concurrent requests. First RL introduces GRPO with reward functions and a GSM8K math training loop.

Core Concepts — standalone deep dives. Rendering covers tokenization across model families. Loss Functions explains cross-entropy, importance sampling, and how to write a custom loss. Completers compare TokenCompleter vs MessageCompleter. There are also tutorials on weights management and evaluations.

Cookbook Abstractions — higher-level patterns. Env and EnvGroupBuilder for RL, custom ProblemEnv implementations, SFT and RL with the config-based API. This is the layer that saves you from writing boilerplate.

Advanced — hyperparameter sweeps, KL penalty tuning, DPO, multi-turn sequence extension, multi-agent self-play, prompt distillation, and a full 3-stage RLHF pipeline (SFT → preference model → RL). Each one is a production pattern you would otherwise have to piece together from scattered blog posts.

Deployment — exporting to HuggingFace, building LoRA adapters, publishing to the Hub, and even an OpenCode integration for chatting with your checkpoint.

Why This Matters

Fine-tuning is not a solved problem yet. The tooling is fragmented, the documentation is often written for people who already know what they are doing, and the gap between “I ran a training script” and “I have a model I can actually use” is wider than most tutorials acknowledge.

These tutorials close some of that gap. They are specific, they are tested, and they assume you are competent but not omniscient. That is rarer than it should be.


I ran through the First RL tutorial on a small model. It took about an hour including the setup. The GRPO implementation is clean — reward functions are just Python functions, the rollout loop is explicit, and the logging tells you what is actually happening rather than showing a loss curve and leaving you to guess.

If you have been meaning to get hands-on with fine-tuning but kept putting it off because the entry point looked messy, this is a good place to start.

Crepi il lupo! 🐺