How I Use LLMs: Andrej Karpathy's Practical Guide
🎥 How I Use LLMs: Andrej Karpathy’s Practical Guide
Andrej Karpathy — Duration: 2+ hours
https://www.youtube.com/watch?v=EWvNQjAaOHw
Introduction
Andrej Karpathy, drawing from his unique perspective as a founding researcher at OpenAI and former Director of AI at Tesla, delivers a comprehensive example-driven, practical walkthrough of how he integrates LLMs into his daily life and work. This video bridges the gap between understanding how LLMs work and actually using them effectively.
The LLM Ecosystem Landscape
Karpathy maps the current LLM ecosystem, identifying ChatGPT as the incumbent but acknowledging competitors including Claude, Gemini, Grok, DeepSeek, and Mistral. He introduces leaderboards like Chatbot Arena and Scale’s SEAL leaderboard as tools for tracking model performance, empowering viewers to make informed decisions rather than relying on marketing claims.
Understanding Model Tiers and Economics
He demonstrates how the same query yields different results across model variants (GPT-4, GPT-4 mini, etc.) and explains the trade-offs between cost and capability. His personal approach of subscribing to multiple services and maintaining an “LLM Council” for important decisions reveals how serious users leverage the strengths of different models for cross-verification.
The Revolution of Thinking Models
Karpathy covers “thinking models” trained with reinforcement learning to develop internal reasoning processes. His live comparison between GPT-4 and O1 Pro on a gradient checking problem provides compelling evidence: the standard model offers generic debugging suggestions, while the thinking model identifies the specific parameter mismatch issue.
Tool Use: Expanding LLM Capabilities
- Internet Search: Models automatically search the web for recent information, with significant variations in implementation across platforms
- Deep Research: Advanced models spend tens of minutes conducting comprehensive research on complex topics, though he cautions about potential hallucinations
- File Uploads and Document Analysis: Practical examples of analyzing nutrition labels, blood test results, and academic papers
Programming and Development Integration
His demonstration of Cursor’s Composer feature shows how AI-assisted coding has evolved from simple code completion to autonomous development — what he terms “vibe coding.” The live building of a Tic-Tac-Toe game with confetti effects and sound demonstrates the current state of AI-assisted development.
Multimodal Capabilities
- Audio Integration: Both “fake audio” (speech-to-text) and “true audio” (models that process audio natively) with advantages of true audio models like Advanced Voice Mode
- Image Processing: Analyzing nutrition labels, blood test results, and memes through screenshot-and-upload workflows
- Video Understanding: Practical applications for education, entertainment, and documentation
Quality of Life Features
- Memory Functionality: LLMs maintaining context across conversations and learning user preferences over time
- Custom Instructions: Tailoring model responses for formality, educational approach, or language learning
- Custom GPTs: Creating specialized tools for specific tasks like language learning
Key Quotes
“ChatGPT is the Original Gangster incumbent.”
“For important decisions, I consult my LLM Council — multiple models cross-verifying each other’s outputs.”
“Vibe coding is the new paradigm — you describe what you want and the AI builds it.”
“These tools are not replacements for human judgment. They’re accelerators for human intent.”
Key References
- ChatGPT — The original mainstream LLM, now with competition from Claude, Gemini, Grok, DeepSeek, Mistral
- Chatbot Arena — Leaderboard for tracking model performance
- O1 Pro / DeepSeek R1 — Thinking models with internal reasoning processes
- Cursor Composer — AI-assisted coding tool for autonomous development
- Advanced Voice Mode — True audio processing for nuanced communication
- Custom GPTs — Specialized tools for specific tasks
Crepi il lupo! 🐺