How I Use LLMs: Andrej Karpathy's Practical Guide

🎥 How I Use LLMs: Andrej Karpathy’s Practical Guide

Andrej Karpathy — Duration: 2+ hours

https://www.youtube.com/watch?v=EWvNQjAaOHw

Introduction

Andrej Karpathy, drawing from his unique perspective as a founding researcher at OpenAI and former Director of AI at Tesla, delivers a comprehensive example-driven, practical walkthrough of how he integrates LLMs into his daily life and work. This video bridges the gap between understanding how LLMs work and actually using them effectively.

The LLM Ecosystem Landscape

Karpathy maps the current LLM ecosystem, identifying ChatGPT as the incumbent but acknowledging competitors including Claude, Gemini, Grok, DeepSeek, and Mistral. He introduces leaderboards like Chatbot Arena and Scale’s SEAL leaderboard as tools for tracking model performance, empowering viewers to make informed decisions rather than relying on marketing claims.

Understanding Model Tiers and Economics

He demonstrates how the same query yields different results across model variants (GPT-4, GPT-4 mini, etc.) and explains the trade-offs between cost and capability. His personal approach of subscribing to multiple services and maintaining an “LLM Council” for important decisions reveals how serious users leverage the strengths of different models for cross-verification.

The Revolution of Thinking Models

Karpathy covers “thinking models” trained with reinforcement learning to develop internal reasoning processes. His live comparison between GPT-4 and O1 Pro on a gradient checking problem provides compelling evidence: the standard model offers generic debugging suggestions, while the thinking model identifies the specific parameter mismatch issue.

Tool Use: Expanding LLM Capabilities

Internet Search: Models automatically search the web for recent information, with significant variations in implementation across platforms
Deep Research: Advanced models spend tens of minutes conducting comprehensive research on complex topics, though he cautions about potential hallucinations
File Uploads and Document Analysis: Practical examples of analyzing nutrition labels, blood test results, and academic papers

Programming and Development Integration

His demonstration of Cursor’s Composer feature shows how AI-assisted coding has evolved from simple code completion to autonomous development — what he terms “vibe coding.” The live building of a Tic-Tac-Toe game with confetti effects and sound demonstrates the current state of AI-assisted development.

Multimodal Capabilities

Audio Integration: Both “fake audio” (speech-to-text) and “true audio” (models that process audio natively) with advantages of true audio models like Advanced Voice Mode
Image Processing: Analyzing nutrition labels, blood test results, and memes through screenshot-and-upload workflows
Video Understanding: Practical applications for education, entertainment, and documentation

Quality of Life Features

Memory Functionality: LLMs maintaining context across conversations and learning user preferences over time
Custom Instructions: Tailoring model responses for formality, educational approach, or language learning
Custom GPTs: Creating specialized tools for specific tasks like language learning

Key Quotes

“ChatGPT is the Original Gangster incumbent.”

“For important decisions, I consult my LLM Council — multiple models cross-verifying each other’s outputs.”

“Vibe coding is the new paradigm — you describe what you want and the AI builds it.”

“These tools are not replacements for human judgment. They’re accelerators for human intent.”

Key References

ChatGPT — The original mainstream LLM, now with competition from Claude, Gemini, Grok, DeepSeek, Mistral
Chatbot Arena — Leaderboard for tracking model performance
O1 Pro / DeepSeek R1 — Thinking models with internal reasoning processes
Cursor Composer — AI-assisted coding tool for autonomous development
Advanced Voice Mode — True audio processing for nuanced communication
Custom GPTs — Specialized tools for specific tasks

Crepi il lupo! 🐺