Vanishing Gradients
Hugo Bowne-Anderson
0
A podcast for people who build with AI. Long-format conversations with people shaping the field about agents, evals, multimodal systems, data infrastructure, and the tools behind them. Guests include Jeremy Howard (fast.ai), Hamel Husain (Parlance Labs), Shreya Shankar (UC Berkeley), Wes McKinney (creator of pandas), Samuel Colvin (Pydantic) and more.
Episoade
-
The Future of Agentic Data Science 25.05.2026 1h 4minSo I think we’re really at a historical moment, and the opportunity is massive. Almost 15 years ago, we were promised that data science was going to be this incredible thing and create all this value for people. And I think nowadays it’s mostly viewed as a cost center in most companies. I think we can really now fulfill that original promise with agentic data science. Thomas Wiecki, Co-creator of PyMC and Founder at PyMC Labs, joins Hugo to talk about how agentic data science is finally fulfilling the promise of Decision Intelligence.We Discuss:* Decision Engines: Transform data science from a cost center providing cryptic answers into a real-time decision intelligence hub delivering actionable outcomes;* Tame the “Garden of Forking Paths”: Overcome human shortcuts by running parallel analyses to provide an honesty check, revealing the true robustness of business conclusions;* Multiplayer Data Science: Foster organizational learning by moving agents into team chats, democratizing “what-if” questions and reducing context-switching friction;* The Full Agentic Data Science Stack: Beyond harness and skills, the full stack includes orchestration for parallel analyses and a causal eval layer to measure actual outcome improvement;* Agentic Dashboards: Move beyond static BI; use chat interfaces to inquire into models and generate real-time, custom visualizations for specific follow-up questions;* Encode Professional Judgment as Skills: Elevate agent performance by encoding expert domain standards and high-fidelity workflows into specific Agent Skills, rather than relying on LLM pre-training;* Ground Decisions in Generative Processes: Prevent hallucinations by forcing agents to model underlying physical or behavioral processes, providing a programmatic guardrail aligned with market realities;* Scripted Causal-Bayesian Workflows: Their methodologically structured nature—from prior elicitation to posterior predictive checks—makes Causal-Bayesian workflows inherently automatable for agents;* Iterative Autonomy via Skills: Achieve autonomy iteratively: verify workflows with human oversight, then encode verifiable parts as skills to hand off trusted tasks;You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉Want to learn how to apply agentic engineering to the world of data science? Come build the future of Agentic Data Science with us in our upcoming course. It’s a live cohort with hands on exercises, capstones, and reusable agent skills, OSS code, and notebooks that will 10x your data science projects. Sign up here and use the code ADSVG10 for 10% off. Hit reply to enquire about group discounts.👈LINKS* Thomas Wiecki on LinkedIn* PyMC Labs* Open-Sourcing Decision Lab: Scaling AI Judgment in Data Science (PyMC Labs blog)* Decision AI Discord* Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results (Sage Journals)* The Agent Harness Reading List* Show Us Your Agent Skills (GitHub)* Agentic Data Science course with Hugo, Thomas, and Luca (10% off with code ADSVG10)* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTube👉Want to learn how to apply agentic engineering to the world of data science? Come build the future of Agentic Data Science with us in our upcoming course. It’s a live cohort with hands on exercises, capstones, and reusable agent skills, OSS code, and notebooks that will 10x your data science projects. Sign up here and use the code ADSVG10 for 10% off. Hit reply to enquire about group discounts👈 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
-
Agent-Harness.ipynb* 20.05.2026 1h 19minOne thing that I don’t like about Claude is that you get into this weird mental state: oh, I think I trust the model. Let’s do the slot machine. Hit click, which puts you in an inactive mode of thinking. Maybe it’s better to use a worse model….Vincent Warmerdam, senior data professional and prolific open-source maintainer (some packages with over a million downloads), now Engineer at marimo, joins Hugo to talk about how the Python notebook is evolving from a static scratchpad into a working agent harness, and what it takes to stay in the loop as a developer when agents are writing most of the code. This episode was originally a livestream Q&A with the Vanishing Gradients audience.We Discuss:* Shared Notebook Canvas: Notebooks act as a shared memory space where agents and humans co-exist, enabling real-time visual feedback by direct manipulation of global state and UI elements;* Speed-of-Thought Models: Faster, open-weight models like Kimi K2 enhance exploratory flow by keeping humans more alert to the code, unlike frontier models that can induce passive thinking;* Pi as a Harness: Vincent favors an agent harness where agents extend themselves rather than reach for MCP, and where hooks can rigidly constrain which files an agent is allowed to read or touch;* Why PRDs Don’t Fit Notebooks: Notebook work is fundamentally exploratory, so the discipline that works for shipping web apps does not transfer cleanly; the one exception is reproducing a paper;* Interactive Code Review: Interactive UIs (e.g., dragging integers) transform code into a physical object, incentivizing developers to actively review and understand agent logic;* Modular “Lego” Components: Provide agents with high-level, well-tested components (”Lego” code) instead of raw boilerplate, creating systems that are easier to debug and modulate;* Algorithm-Driven Visualization: Let the algorithm dictate the visualization needed, rather than choosing visualizations first, revealing the most interesting structures within the data;* Don’t Outsource the Thinking: Pen and paper architectural planning, walks away from the keyboard, and protecting calm remain the most effective ways to keep producing good ideas in the age of AI-generated software.* Agent Auto-Healing: A marimo-specific linter solved 60% of agent errors overnight by letting agents diagnose and fix their own “slop” without complex prompt engineering;* Incremental Generation: Avoid monolithic LLM outputs; generate code one to two cells at a time to prevent laziness and ensure human oversight and learning;Vincent closes on the idea that calm, not the latest frontier model, is the most underrated tool for building well, and that we should study LLM output the way chess players studied the engines that beat them.Vincent gives several live demos toward the end of the episode. He describes them well enough to follow on audio, but the visuals are worth seeing, so check out the YouTube version here.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn how to apply agentic engineering to the world of data science? Come build the future of Agentic Data Science with us in our upcoming course. It’s a live cohort with hands on exercises, capstones, and reusable agent skills, OSS code, and notebooks that will 10x your data science projects. Sign up here and use the code ADSVG10 for 10% off.👈Also join us for Ep. 3 of Show Us Your Agent Skills: with Vincent, Paul Iusztin (Decoding AI), Eleanor Berger (Elite AI-Assisted Coding), Alan Nichol (Rasa), Nico Gerold (amp), and Matthew Honnibal (spaCy, Explosion). Register on lu.ma to join live, or catch the recording afterwards.LINKS* Vincent Warmerdam on LinkedIn* Vincent’s website (koaning.io)* Wiggly Stuff — Vincent’s widget library* Marimo Gallery* skills.sh* Armin Ronacher on Pi (the minimal agent inside open claw)* Building Agents That Build Themselves — Hugo’s workshop write-up with Ivan Leo* Data Science Fiction: Winning at Metrics, Losing at AI Evals — Hugo’s blog post based on Vincent’s talk* Isaac Flath’s project (on X)* Braid (video game)* Hugo’s earlier podcast with Akshay (marimo)* Elite AI Assisted Coding — Eleanor Berger’s course (Vanishing Gradients community gets 25% off with code “HUGO”)* GameMakers Toolkit (YouTube)* Upcoming Events on Luma* Vanishing Gradients on YouTube* Come build the future of Agentic Data Science with us in our upcoming course (10% off) .How You Can Support Vanishing GradientsVanishing Gradients is a podcast, workshop series, blog, and newsletter focused on what you can build with AI right now. Over 70 episodes with expert practitioners from Google DeepMind, Netflix, Stanford, and elsewhere. Hundreds of hours of free, hands-on workshops. All independent, all free.If you want to help keep it going:* Become a paid subscriber, from $8/month* Share this with a builder who’d find it useful* Subscribe to our YouTube channel. Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
-
Agentic Engineering and the Lost Art of Verification 12.05.2026 1h 32min> I almost don’t read code now. My approach with Roborev is it’s like my code reader. The mantra is: Roborev reads every line of code that is generated. It gets read multiple times. And so, whenever I push up a pull request, the branch gets re-reviewed. And so by the time I’m merging a pull request into a repository, the code has all been read by agents four or five times minimum. I look at the code in terms of structural detail: does it look right?— Wes McKinney (creator of pandas, POSIT)Wes, Jeremiah Lowin (Prefect), and Randy Olson (Good Eye Labs) join Hugo and his cohost Thomas Wiecki (PyMC Labs) for the premiere of Show Us Your Agent Skills, a live session where guests walk us through the exact skills, workflows, and setups they use to work with agents every day.We Discuss:* Wes McKinney on why he barely writes, or even reads, code anymore, his “software factory” of parallel agents, and RoboRev, the background reviewer that reads every line four or five times before he merges;* The shift from “vibe coding” to agentic engineering, and why verification, not reading, is the part that actually matters;* Jeremiah Lowin on years of context engineering: trickling voice memos, recorded meetings, and morning briefs into his agent’s memory substrate as a true “second brain”;* Why Jeremiah picked OpenCode specifically for how deeply he can customize its memory, and what he’s building with FastMCP, Prefab, and Cardboard;* Randy Olson on encoding human judgment, like Tufte’s rules for data visualization, directly into agent skills, so the agents themselves perform the verification;* The “digital twin” Randy loads into his agents as a thought partner that pushes back instead of agreeing;* Skills as thin drivers, progressive disclosure, and managing context rot across extended sessions;* The rise of ephemeral, “just for me” software that agents finally make viable.Skills and workflows discussed and shown in the episode:* Wes’s RoboRev background code reviewer, his “software factory” dashboard, and his agentic engineering setup built on the Superpowers skills framework;* Jeremiah’s “explain” skill (which anchors every other skill he has), his voice memo memory pipeline, his FastMCP and Prefab projects, and Cardboard, his ephemeral presentation tool;* Randy’s data visualization verifier skills, his digital twin thought partner prompt, his cron job reports for colleagues, and his reflect and improve skill design pattern.Check out the GitHub repo where we’re starting to drop some of these skills and workflows for you to grab and try yourself.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!Up next on Show Us Your Agent Skills: Hilary Mason (CEO, HiddenDoor), Bryan Bischof (Theory Ventures), Eric Ma (Research DS lead, Moderna Therapeutics), and Tomasz Tunguz (Theory Ventures). Register on lu.ma to join live, or catch the recording afterwards.👉 Want to learn how to apply agentic engineering to the world of data science? Come build the future of Agentic Data Science with us in our upcoming course. It’s a live cohort with hands on exercises, capstones, and reusable agent skills, OSS code, and notebooks that will 10x your data science projects. Sign up here.👈LINKS* spicytakes.org, Wes McKinney’s website* RoboRev, Wes’s background code reviewer* Agents View, Wes’s agent session database* Middleman, Wes’s local GitHub dashboard* Superpowers, Jesse Vincent’s skills framework that Wes builds on* An Open Source Maintainer’s Guide to Saying No, by Jeremiah Lowin* FastMCP* Prefab, Jeremiah’s Python DSL for generative UIs* Beautiful Charts with AI, by Randy Olson* The Coding Agent is Dead, by Amp* Building Effective Agents, by the Anthropic team* Show Us Your Agent Skills, the GitHub repo where we are dropping skills and workflows from the show* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTube* Come build the future of Agentic Data Science with us in our upcoming course.How You Can Support Vanishing GradientsVanishing Gradients is a podcast, workshop series, blog, and newsletter focused on what you can build with AI right now. Over 70 episodes with expert practitioners from Google DeepMind, Netflix, Stanford, and elsewhere. Hundreds of hours of free, hands-on workshops. All independent, all free.If you want to help keep it going:* Become a paid subscriber, from $8/month* Share this with a builder who’d find it useful* Subscribe to our YouTube channel. Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
-
Next Level AI Evals for 2026 23.04.2026 53minThere are a lot of reasons why we should do AI evals. For many companies doing AI evals is the way to build the feedback loop into the product development lifecycle. So it is like your compass. We’re using AI evals as a compass to guide product development and also product iteration. And also, many times we need evals to function as the pass or fail gate in release decisions. Whether this product is good enough for release or whether it is good enough for experiment, evals are also used in that.Stella Wenxing Liu, Head of Applied Science at ASU, and Eddie Landesberg, Staff Data Scientist at Google, join Hugo to talk about why AI evaluation is evolving from “vibe checks” into a rigorous, multi-disciplinary science and how causal inference will take AI evals to the next level in 2026.Vanishing Gradients is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.They Discuss:* Team-Centric AI Evals, integrating product managers, data scientists, and SMEs under a “benevolent dictator” (or not!) to ensure comprehensive and effective evaluation;* Custom Evaluation Metrics, moving beyond generic vendor metrics to analyze raw data and identify specific failure modes, avoiding generic product outcomes;* AI as Policy Evaluation, framing AI evaluation as a causal inference problem to estimate counterfactual performance of new “policies” (prompts, models) and predict online AB test outcomes;* Clear Product Constraints, defining what an AI product should not do with strict guardrails to prevent misuse, control costs, and avoid brand dilution;* Calibrated LLM Judges, statistically aligning LLM-as-a-judge with human experts using causal inference to ensure valid proxies for human welfare and business objectives;* Essential Data Curiosity, fostering a culture of manual data inspection to build intuition before relying on automated error analysis or agents, ensuring effective system design;* Statistical AI Evaluation, shifting from unit-test thinking to non-deterministic distributions, using confidence intervals and power analysis to discern genuine improvements from statistical noise;* Proactive Regulatory Compliance, developing rigorous, defensible internal evaluation standards now to gain a competitive advantage as vague AI regulations move towards enforced compliance;* Human-Centric Benchmarking, grounding AI systems in human judgment and user values, moving beyond automated scores to build resilient and differentiated AI.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Stella has just started teaching a cohort of her AI Evals and Analytics Playbook course starting this week. She’s kindly giving listeners of Vanishing Gradients 30% off with this link.👈Our flagship course Building AI Applications just wrapped its final cohort but we’re cooking up something new. If you want to be first to hear about it (and help shape what we build), drop your thoughts here.LINKS* Stella Wenxing Liu on LinkedIn* Eddie Landesberg on LinkedIn* Stella’s AI Evals & Analytics Playbook course on Maven (30% community discount)* CJE (Causal Judge Evaluation) package by Eddie* Trillion Dollar Coach* Goodhart’s Law* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTubeHow You Can Support Vanishing GradientsVanishing Gradients is a podcast, workshop series, blog, and newsletter focused on what you can build with AI right now. Over 70 episodes with expert practitioners from Google DeepMind, Netflix, Stanford, and elsewhere. Hundreds of hours of free, hands-on workshops. All independent, all free.If you want to help keep it going:* Become a paid subscriber, from $8/month* Share this with a builder who’d find it useful* Subscribe to our YouTube channel.Thanks for reading Vanishing Gradients! This post is public so feel free to share it. Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
-
Privacy Theater Is Not Privacy Engineering: What It Actually Takes to Ship Safe AI 15.04.2026 1h 6minKatharine Jarmul, Privacy in ML/AI Expert & Author of Practical Data Privacy, joins Hugo to unpack why most AI privacy advice is theater: and what technical privacy actually looks like when you’re shipping LLMs, agents, and multimodal systems into the real world.In this episode, we dig into how to build defensible systems in an era of AI agents and multimodal models: why system prompts (and your entire agent harness!) should be considered public by default, and why “privacy observability” is as critical as data observability for anyone building with LLMs today. Multimodal is what changes the threat model: identifiers hide in images, audio, and metadata, not just text, and the old anonymization playbook doesn’t cover it.Vanishing Gradients is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.We Discuss:* No Convenience Tax, you don’t have to trade privacy for utility: high-utility AI products can be privacy-preserving through technical controls like privacy routing and input sanitization;* Public Prompts and Harnesses: assume any instruction or secret in a system prompt or agent harness will be exfiltrated; don’t put sensitive info there in the first place;* Privacy Observability, tag and track data flows so information is used only for its original intended purpose: catch design flaws before they become legal problems;* Technical Privacy, implement mathematical and statistical constraints directly into ML systems and data flows so privacy is measurable and enforceable, not aspirational;* Tiered Guardrails, a three-layer approach: deterministic filters for hard rules, algorithmic models for nuanced classification, and internal alignment training for behavioral baselines;* Federated Learning Is Not Privacy, model updates in FL leak sensitive data on their own: you must layer differential privacy or encrypted computation on top, or you’re reverse-engineerable;* Anonymization Spectrum, navigate the “grayscale” of privacy in multimodal AI, balancing data utility and individual risk as identifiers hide in non-obvious places;* Privacy Champions, embed privacy accountability directly into development by training and incentivizing engineers inside product teams;* Red Teaming as Ritual, your goal is to attack yourself: practice thinking like an attacker, and turn privacy testing into an organization-wide creative ritual rather than a siloed security task.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Katharine is teaching her next cohort of Practical AI Privacy starting April 20. She’s kindly giving readers of Vanishing Gradients 10% off. Use this link. I’ll be taking it so hope to see you there!👈Our flagship course Building AI Applications just wrapped its final cohort but we’re cooking up something new. If you want to be first to hear about it (and help shape what we build), drop your thoughts here.LINKS* Practical AI Privacy course on Maven (10% off with code build-with-privacy)* Katharine Jarmul on LinkedIn* Probably Private — Katharine’s website & newsletter* Practical Data Privacy (Katharine’s book)* Let’s Build an AI Privacy Router — Lightning Lesson* Practical AI Privacy: Agents & Local LLMs (newsletter issue)* A Deep Dive into Memorization in Deep Learning (kjamistan blog)* Microsoft Presidio* Llama Guard 3 8B on Hugging Face* Nicholas Carlini* From Magic to Malware: How OpenClaws Agent Skills Become an Attack Surface (1Password)* Owning Ethics (Metcalf, Moss, boyd — Data & Society)* Hugo on guardrails in LLM applications* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTubeHow You Can Support Vanishing GradientsVanishing Gradients is a podcast, workshop series, blog, and newsletter focused on what you can build with AI right now. Over 70 episodes with expert practitioners from Google DeepMind, Netflix, Stanford, and elsewhere. Hundreds of hours of free, hands-on workshops. All independent, all free.If you want to help keep it going:* Become a paid subscriber, from $8/month* Share this with a builder who’d find it useful* Subscribe to our YouTube channel.Thanks for reading Vanishing Gradients! This post is public so feel free to share it. Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
-
LLM Architecture in 2026: What You Need to Know with Sebastian Raschka 13.04.2026 1h 18minIf you take a model release as an anchor point, let’s say Nemotron 3 or Qwen 3.5, you can go in both directions: You can either plug them into an agent and play around with that, or you can look, okay, what does the model look like under the hood? What are the ingredients? What type of attention mechanism do they use? What are currently research techniques that could make that even better in the next generation of models? What can we swap out, basically? And I’m interested in both of these!Sebastian Raschka, Independent AI Researcher and author of Build a Large Language Model from Scratch, joins Hugo to talk about what’s changed in AI architecture, from post-training to hybrid models, and why understanding what’s under the hood matters more than ever for developers building in the agentic era. Sebastian’s upcoming book, Build a Reasoning Model from Scratch, currently available for pre-order on Amazon and in early access on Manning!Vanishing Gradients is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.We Discuss:* Ed Tech for Agents: should we design educational content specifically for agentic systems, or is there a better approach?* Inference Scaling is the new frontier, driving “gold-level” performance during generation via parallel sampling and internal meta-judges;* Hybrid Architectures from Qwen 3.5 and Nemotron 3 scale almost linearly, making long-context agentic workflows significantly more affordable and performant;* Multi-head Latent Attention (MLA), developed by DeepSeek, wins the KV cache war by drastically reducing memory overhead without performance hits;* Agent Harnesses need to be continuously simplified as frontier models are post-trained on agent trajectories. Teams that don’t strip back their scaffolding risk the harness getting in the way of a more capable model.* “AI Psychosis”: the cognitive load of supervising self-supervising agents, and why we’re all conducting an orchestra we were never trained to conduct;* Sebastian’s AI Stack: a surprisingly simple setup (Mac mini, Codex, Ollama) with a ~20-item QA checklist, delegating the boring work to preserve energy for creative development;* Fine-tuning is now an economic decision, optimizing costs and latency for high-volume tasks where long system prompts outweigh a one-time training run;* Process Reward Models (PRMs) are the next frontier, verifying intermediate reasoning steps to solve “hallucination in the middle” for complex math and code tasks;* “Implementation Does Not Lie”: Sebastian’s layer-by-layer verification philosophy, comparing from-scratch builds against HuggingFace references to catch details invisible in papers;* Architecture Details dictate inference stack choices; nuances like RMSNorm stability or RoPE flavors are critical for optimal performance and troubleshooting;* The Distillation Loop drives open-weight parity, enabling specialized, “frontier-class” models by “pre-digesting” frontier outputs without multi-million dollar training risks.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!Our flagship course Building AI Applications just wrapped its final cohort but we’re cooking up something new. If you want to be first to hear about it (and help shape what we build), drop your thoughts here.Links and Resources* Build a Reasoning Model (From Scratch): Sebastian’s new book, currently available for pre-order on Amazon and in early access on Manning. You’ll learn how reasoning LLMs actually work by starting with a pre-trained base LLM and adding reasoning capabilities step by step in code. A hands-on follow-up to Build a Large Language Model from Scratch.* LLM Architecture Gallery: Sebastian’s collection of architecture figures and fact sheets from his blog posts, updated with each major model release. A go-to visual reference for comparing what’s changed under the hood across model generations.* Sebastian Raschka on LinkedIn* Sebastian’s website* Ahead of AI (Sebastian’s Substack)* Build a Large Language Model from Scratch* PinchBench: OpenClaw Benchmark Leaderboard* DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning* Gated Delta Networks: Improving Mamba2 with Delta Rule (ICLR 2025)* DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning* Hugging Face Model Hub* Upcoming Events on Luma* Vanishing Gradients on YouTubeA Bit More on Agent Harnesses* Components of A Coding Agent by Sebastian* How To Build An Agent that Builds its own Harness by Hugo and Ivan Leo (DeepMind, ex-Manus)* Build Your Own Deep Research Agent with Hugo & Ivan Leo (Google DeepMind, ex-Manus): In this livestream, you’ll learn how to build a production-grade agent harness from scratch in pure Python;* AI Agent Harness, 3 Principles for Context Engineering, and the Bitter Lesson Revisited with Lance Martin (Anthropic), Duncan Gilchrist (Delphina), and Hugo* The Post-Coding Era: What Happens When AI Writes the System? with Nicholas Moy (Google DeepMind), Duncan Gilchrist (Delphina), and Hugo* What is an Agent Harness? from What 300+ Engineers from Netflix, Amazon, and Instacart Asked About AI Engineering.How You Can Support Vanishing GradientsVanishing Gradients is a podcast, workshop series, blog, and newsletter focused on what you can build with AI right now. Over 70 episodes with expert practitioners from Google DeepMind, Netflix, Stanford, and elsewhere. Hundreds of hours of free, hands-on workshops. All independent, all free.If you want to help keep it going:* Become a paid subscriber, from $8/month* Share this with a builder who’d find it useful* Subscribe to our YouTube channel.Thanks for reading Vanishing Gradients! This post is public so feel free to share it. Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
-
Episode 72: Why Agents Solve the Wrong Problem (and What Data Scientists Do Instead) 20.03.2026 1h 33minI often see what I would consider to be b******t evals, especially in data, like write this dumb SQL. Almost every one of these dumb SQL questions that I’ve seen for benchmarks are just so either obviously easy or overwhelmingly adversarial. They just, they don’t feel valuable as a data scientist, it’s something that you probably would never ask a real data scientist to do. So I went out my way to create real ones. Let me read one to you.Bryan Bischof, Head of AI at Theory Ventures, joins Hugo to talk about what happened when 150 people spent six hours using AI agents to answer real data science questions across SQL tables, log files, and 750,000 PDFs.They Discuss:* Failure Funnels, pinpoint where agent reasoning breaks down using causal-chain binary evaluations instead of vague 1-5 scales;* Median Score: 23 out of 65, what happened when world-class engineers turned agents loose on real data work, and why general-purpose coding agents with human prodding beat fancy frameworks;* Zero-Cost Submissions Kill Trust, without a penalty for wrong answers, agents hill-climb to correct submissions through brute force instead of building confidence;* Data Science is “Zooming”, moving beyond binary decisions to iterative problem framing, refining “does our inventory suck?” into a tractable hypothesis;* MCP as Semantic Layer, model your organization’s proprietary knowledge once and distribute it to whatever LLM interface your team prefers;* The Subagent vs. Tool Debate, a distinction that adds cognitive load without hiding complexity;* Self-Orchestration Gap, agents don’t yet realize they should trigger specialized extraction frameworks like DocETL instead of reading 750K PDFs one by one;* The Future of Evals, from vibe checks to objective functions and continuous user feedback that lets systems converge on reliability.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort has started. Registration is still open. All sessions are recorded so don’t worry about having missed any. Here is a 25% discount code for readers. 👈LINKS* Bryan Bischof on Twitter/X* Bryan Bischof on LinkedIn* Theory Ventures* The Hunt for a Trustworthy Data Agent (blog post)* America’s Next Top Modeler GitHub repo* Hamel’s evals FAQ: How do I evaluate agentic workflows?* DocETL* LLM Judges and AI Agents at Scale (Hugo’s podcast with Shreya Shankar)* When Your Metrics Are Lying (Cimo Labs)* Lessons from a Year of Building with LLMs (livestream on YouTube)* Bryan Bischof: The Map is Not the Territory (YouTube)* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTube👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort has started. Registration is still open. All sessions are recorded so don’t worry about having missed any. Here is a 25% discount code for readers. 👈 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
-
Episode 71: Durable Agents - How to Build AI Systems That Survive a Crash with Samuel Colvin 18.02.2026 51minOur thesis is that AI is still just engineering… those people who tell us for fun and profit, that somehow AI is so, so profound, so new, so different from anything that’s gone before that it somehow eclipses the need for good engineering practice are wrong. We need that good engineering practice still, and for the most part, most things are not new. But there are some things that have become more important with AI. One of those is durability.Samuel Colvin, Creator of Pydantic AI, joins Hugo to talk about applying battle-tested software engineering principles to build durable and reliable AI agents.They Discuss:* Production agents require engineering-grade reliability: Unlike messy coding agents, production agents need high constraint, reliability, and the ability to perform hundreds of tasks without drifting into unusual behavior;* Agents are the new “quantum” of AI software: Modern architecture uses discrete “agentlets”: small, specialized building blocks stitched together for sub-tasks within larger, durable systems;* Stop building “chocolate teapot” execution frameworks: Ditch rudimentary snapshotting; use battle-tested durable execution engines like Temporal for robust retry logic and state management;* AI observability will be a native feature: In five years, AI observability will be integrated, with token counts and prompt traces becoming standard features of all observability platforms;* Split agents into deterministic workflows and stochastic activities: Ensure true durability by isolating deterministic workflow logic from stochastic activities (IO, LLM calls) to cache results and prevent redundant model calls;* Type safety is essential for enterprise agents: Sacrificing type safety for flexible graphs leads to unmaintainable software; professional AI engineering demands strict type definitions for parallel node execution and state recovery;* Standardize on OpenTelemetry for portability: Use OpenTelemetry (OTel) to ensure agent traces and logs are portable, preventing vendor lock-in and integrating seamlessly into existing enterprise monitoring.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Here is a 25% discount code for listeners. 👈LINKS* Samuel Colvin on LinkedIn* Pydantic* Pydantic Stack Demo repo* Deep research example code* Temporal* DBOS (Postgres alternative to Temporal)* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTube👉Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort starts March 10, 2026. Here is a 25% discount code for listeners.👈https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfs Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
-
Episode 70: 1,400 Production AI Deployments 12.02.2026 1h 9minThere’s a company who spent almost $50,000 because an agent went into an infinite loop and they forgot about it for a month.It had no failures and I guess no one was monitoring these costs. It’s nice that people do write about that in the database as well. After it happened, they said: watch out for infinite loops. Watch out for cascading tool failures. Watch out for silent failures where the agent reports it has succeeded when it didn’t!We Discuss:* Why the most successful teams are ripping out and rebuilding their agent systems every few weeks as models improve, and why over-engineering now creates technical debt you can’t afford later;* The $50,000 infinite loop disaster and why “silent failures” are the biggest risk in production: agents confidently report success while spiraling into expensive mistakes;* How ELIOS built emergency voice agents with sub-400ms response times by aggressively throwing away context every few seconds, and why these extreme patterns are becoming standard practice;* Why DoorDash uses a three-tier agent architecture (manager, progress tracker, and specialists) with a persistent workspace that lets agents collaborate across hours or days;* Why simple text files and markdown are emerging as the best “continual learning” layer: human-readable memory that persists across sessions without fine-tuning models;* The 100-to-1 problem: for every useful output, tool-calling agents generate 100 tokens of noise, and the three tactics (reduce, offload, isolate) teams use to manage it;* Why companies are choosing Gemini Flash for document processing and Opus for long reasoning chains, and how to match models to your actual usage patterns;* The debate over vector databases versus simple grep and cat, and why giving agents standard command-line tools often beats complex APIs;* What “re-architect” as a job title reveals about the shift from 70% scaffolding / 30% model to 90% model / 10% scaffolding, and why knowing when to rip things out is the may be the most important skill today.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort starts March 10, 2026. Here is a 25% discount code for readers. 👈Show Notes Links* Alex Strick van Linschoten on LinkedIn* Alex Strick van Linschoten on Twitter/X* LLMOps Database* LLMOps Database Dataset on Hugging Face* Hugo’s MCP Server for LLMOps Database* Alex’s Blog: What 1,200+ Production Deployments Reveal About LLMOps in 2025* Previous Episode: Practical Lessons from 750 Real-World LLM Deployments* Previous Episode: Tales from 400 LLM Deployments* Context Rot Research by Chroma* Hugo’s Post: AI Agent Harness - 3 Principles for Context Engineering* Hugo’s Post: The Rise of Agentic Search* Episode with Nick Moy: The Post-Coding Era* Hugo’s Personal Podcast Prep Skill Gist* Claude Tool Search Documentation* Gastown on GitHub (Steve Yegge)* Welcome to Gastown by Steve Yegge* ZenML - Open Source MLOps & LLMOps Framework* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast livestream on YouTube* Join the final cohort of our Building AI Applications course in March, 2026 (25% off for listeners)👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort starts March 10, 2026. Here is a 25% discount code for readers. 👈 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
-
Episode 69: Python is Dead. Long Live Python! With the Creators of pandas & Parquet 03.02.2026 55min> It’s the agent writing the code. And it’s the development loop of writing the code, building testing, write the code, build test and iterating. And so I do think we’ll see for many types of software, a shift away from Python towards other programming languages. I think Go is probably the best language for those like other types of software projects. And like I said, I haven’t written a line of Go code in my life.– Wes McKinney (creator of pandas Principal Architect at Posit),Wes McKinney, Marcel Kornacker, and Alison Hill join Hugo to talk about the architectural shift for multimodal AI, the rise of “agent ergonomics,” and the evolving role of developers in an AI-generated future.We Discuss:* Agent Ergonomics: Optimize for agent iteration speed, shifting from human coding to fast test environments, potentially favoring languages like Go;* Adversarial Code Review: Deploy diverse AI models to peer-review agent-generated code, catching subtle bugs humans miss;* Multimodal Data Verbs: Make operations like resizing and rotating native to your database to eliminate data-plumbing bottlenecks;* Taste as Differentiator: Value “taste”—the ability to curate and refine the best output from countless AI-generated options—over sheer execution speed;* 100x Software Volume: Embrace ephemeral, just-in-time software; prioritize aggressive generation and adversarial testing over careful planning for quality.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript of the workshop & fireside chat here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Here is a discount code for readers. 👈This was a fireside chat at the end of a livestreamed workshop we did on building multimodal AI systems with Pixeltable. Check out the full workshop below (all code here on Github):Links and Resources* Wes McKinney on LinkedIn* Marcel Kornacker on LinkedIn* Alison Hill on LinkedIn* Spicy Takes* Palmer Penguins* Pixeltable* Posit* Positron* Building Multimodal AI Systems Workshop Repository* Pixeltable Docs: LLM Tool Calling with MCP Servers* Pixeltable Docs: Working with Pydantic* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTube* Join the final cohort of our Building AI Applications course in March, 2026 (25% off for listeners)https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfsWhat people said during the workshop“I think the interface looks amazing/simple. Strong work! 🦾” — @goldentribe“This is quite amazing. Watching this I felt the same way when I first leant pandas, NumPy and scikit and how well i was able to manipulate and wrangle data. PixelTable feels seamless and looks as good as those legendary frameworks but for Multimodal Data.” — @vinod7“This is all extremely cool to see, I love the API and the approach.” — @steveb4191“Thanks so much, Hugo! That was very insightful! Great work Alison and Marcel!” — @vinod7“Just wrapped up watching a replay of the Pixeltable workshop. So cool!! Love the notebooks and working examples. The important parts were covered and worked beautifully 🕺” — @therobbrennan👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Here is a discount code for readers. 👈 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
-
Episode 68: A Builder’s Guide to Agentic Search & Retrieval with Doug Turnbull & John Berryman 23.01.2026 1h 28minThe best way to build a horrible search product? Don’t ever measure anything against what a user wants.Search veterans Doug Turnbull (Led Search at Reddit + Shopify; Wrote Relevant Search + AI Powered Search) and John Berryman (Early Engineer on Github Copilot; Author of Relevant Search + Prompt Engineering for LLMs), join Hugo to talk about how to build Agentic Search Applications.We Discuss:* The evolution of information retrieval as it moves from traditional keyword search toward “agentic search“ and what this means for builders.* John’s five-level maturity model (you can prototype today!) for AI adoption, moving from Trad Search to conversational AI to asynchronous research assistants that reason about result quality.* The Agentic Search Builders Playbook, including why and how you should “hand-roll” your own agentic loops to maintain control;* The importance of “revealed preferences” that LLM-judges often miss (evaluations must use real clickstream data to capture “revealed preferences” that semantic relevance alone cannot infer)* Patterns and Anti-Patterns for Agentic Search Applications* Learning and teaching Search in the Age of AgentsYou can find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Here is a discount code for readers. 👈Doug and Hugo are also doing a free lightning lesson on Feb 20 about How To Build Your First Agentic Search Application! You’ll walk away with a framework & code to build your first agentic search app. Register here to join live or get the recording after.Links and ResourcesGuests* Arcturus Labs (John’s website)* Software Doug (Doug’s website)* John Berryman on LinkedIn* Doug Turnbull on LinkedInBooks* Relevant Search by Doug Turnbull & John Berryman (Manning)* AI-Powered Search by Doug Turnbull (Manning)* Prompt Engineering for LLMs by John Berryman (O’Reilly)Blog Posts* Incremental AI Adoption for E-commerce by John Berryman* Roaming RAG – RAG without the Vector Database by John Berryman* Agents Turn Simple Keyword Search into Compelling Search Experiences by Doug Turnbull* A Simple Agentic Loop with Just Python Functions by Doug Turnbull* Agentic Code Generation to Optimize a Search Reranker by Doug Turnbull* LLM Judges Aren’t the Shortcut You Think by Doug Turnbul (Hugo’s 5 minute video below)* Malleable Software by Ink & Switch (inc. Geoffrey Lit)* Patterns and Anti-Patterns for Building with AI by Hugo Bowne-AndersonOther Resources* The Rise of Agentic Search, a recent VG Podcast with Jeff Huber* Karpathy on Cognitive Core LLMs* Cheat at Search with Agents course by Doug Turnbull (use code: vanishinggradients for $200 off)* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTube* Join the final cohort of our Building AI Applications course in Q1, 2026 (25% off for listeners)Timestamps (for YouTube livestream)00:00 How to Build Agentic Search & Retrieval Systems02:48 Defining Search and AI03:26 Evolution of Search Technologies08:46 Search in E-commerce and Other Domains12:15 Combining Search and AI: RAG and LLMs23:50 User Intent and Search Optimization29:47 Levels of AI Integration in Search32:25 Exploring the Complexity of Search in Various Domains33:49 The Evolution and Impact of Agentic Search34:07 Defining Terms: RAG and Agentic Search34:52 The Research Loop and Tool Interaction35:55 Formal Protocols and Structured Outputs38:39 Building Agentic Search Experiences: Tips and Advice41:50 The Importance of Empathy in AI and Search Development54:30 The Role of UX in Search Applications01:01:15 Future of Search: Malleable User Interfaces01:02:38 Exploring Malleable Software01:04:20 The Coordination Challenge in Software Development01:05:23 The Impact of Claude Code & Claude Cowork01:06:22 The Future of Knowledge Work with AI01:12:39 Evaluating Search Algorithms with AI01:15:15 The Role of Agents in Search Optimization01:29:55 Teaching AI and Search Techniques01:34:25 Final Thoughts and Farewell👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Here is a discount code for readers. 👈https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgpod Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
-
Episode 67: Saving Hundreds of Hours of Dev Time with AI Agents That Learn 14.01.2026 1h 18minThis is continual learning, right? Everyone has been talking about continual learning as the next challenge in AI. Actually, it’s solved. Just tell it to keep some notes somewhere. Sure, it’s not, it’s not machine learning, but in some ways it is because when it will load this text file again, it will influence what it does … And it works so well: it’s easy to understand. It’s easy to inspect, it’s easy to evolve and modify!Eleanor Berger and Isaac Flaath, the minds behind Elite AI Assisted Coding, join Hugo to talk about how to redefine software development through effective AI-assisted coding, leveraging “specification-first” approaches and advanced agentic workflows.We Discuss:* Markdown learning loops: Use simple agents.md files for agents to self-update rules and persist context, creating inspectable, low-cost learning;* Intent-first development: As AI commoditizes syntax, defining clear specs and what makes a result “good” becomes the core, durable developer skill;* Effortless documentation: Leverage LLMs to distill messy “brain dumps” or walks-and-talks into structured project specifications, offloading context faster;* Modular agent skills: Transition from MCP servers to simple markdown-based “skills” with YAML and scripts, allowing progressive disclosure of tool details;* Scheduled async agents: Break the chat-based productivity ceiling by using GitHub Actions or Cron jobs for agents to work on issues, shifting humans to reviewers;* Automated tech debt audits: Deploy background agents to identify duplicate code, architectural drift, or missing test coverage, leveraging AI to police AI-induced messiness;* Explicit knowledge culture: AI agents eliminate “cafeteria chat” by forcing explicit, machine-readable documentation, solving the perennial problem of lost institutional knowledge;* Tiered model strategy: Optimize token spend by using high-tier “reasoning” models (e.g., Opus) for planning and low-cost, high-speed models (e.g., Flash) for execution;* Ephemeral software specs: With near-zero generation costs, software shifts from static products to dynamic, regenerated code based on a permanent, underlying specification.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Eleanor & Isaac are teaching their next cohort of their Elite AI Assisted Coding course starting this week. They’re kindly giving readers of Vanishing Gradients 25% off. Use this link.👈👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Here is a discount code for readers. 👈Show Notes* Elite AI Assisted Coding Substack* Eleanor Berger on LinkedIn* Isaac Flaath on LinkedIn* Elite AI Assisted Coding Course (Use the code HUGO for 25% off)* How to Build an AI Agent with AI-Assisted Coding* Eleanor/Isaac’s blog post “The SpecFlow Process for AI Coding”* Eleanor’s growing list of (free) tutorials on Agent Skills* Eleanor’s YouTube playlist on agent skills* Eleanor’s blog post “Are (Agent) Skills the New Apps”* Simon Willison’s blog post on skills/general computer automation/data journalism agents* Eleanor/Isaac’s blog post about asynchronous client agents in GitHub actions* Eleanor/Isaac’s blog post on agentic coding workflows with Hang Yu, Product Lead for Qoder @ Alibaba* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTube* Join the final cohort of our Building AI Applications course in Q1, 2026 (25% off for listeners)Timestamps (for YouTube livestream)00:00 Introduction to Elite AI Assisted Coding02:24 Starting a New AI Project: Best Practices03:19 The Importance of Context in AI Projects07:19 Specification-First Planning12:01 Sharing Intent and Documentation18:27 Living Documentation and Continual Learning24:36 Choosing the Right Tools and Models29:18 Managing Costs and Token Usage40:16 Using Different Models for Different Tasks43:41 Mastering One Model for Better Results44:54 The Rise of Agent Skills in 202645:34 Understanding the Importance of Skills47:18 Practical Applications of Agent Skills01:11:43 Security Concerns with AI Agents01:15:02 Collaborative AI-Assisted Coding01:18:59 Future of AI-Assisted Coding01:22:27 Key Takeaways for Effective AI-Assisted CodingLive workshop with Eleanor, Isaac, & HugoWe also recently did a 90-minute workshop on How to Build an AI Agent with AI-Assisted Coding.We wrote a blog post on it for those who don’t have 90 minutes right now. Check it out here.I then made a 4 min video about it all for those who don’t have time to read the blog post.👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Here is a discount code for readers. 👈https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vg-ei Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
-
Episode 66: The Agent Paradox - Why Moderna's Most Productive AI Systems Aren't Agents 08.01.2026 42minSurprise. We don’t have agents. I actually went in and did an audit of all the LLM applications that we’ve developed internally. And if you were to take Anthropic’s definition of workflow versus agent, we don’t have agents. I would not classify any of our applications as agents. xEric Ma, who leads Research Data Science in the Data Science and AI group at Moderna, joins Hugo on moving past the hype of autonomous agents to build reliable, high-value workflows.We discuss:* Reliable Workflows: Prioritize rigid workflows over dynamic AI agents to ensure reliability and minimize stochasticity in production environments;* Permission Mapping: The true challenge in regulated environments is security, specifically mapping permissions across source documents, vector stores, and model weights;* Trace Log Risk: LLM execution traces pose a regulatory risk, inadvertently leaking restricted data like trade secrets or personal information;* High-Value Data Work: LLMs excel at transforming archived documents and freeform forms into required formats, offloading significant “janitorial” work from scientists;* “Non-LLM” First: Solve problems with simpler tools like Python or ML models before LLMs to ensure robustness and eliminate generative AI stochasticity;* Contextual Evaluation: Tailor evaluation rigor to consequences; low-stakes tools can be “vibe-checked,” while patient safety outputs demand exhaustive error characterization;* Serverless Biotech Backbone: Serverless infrastructure like Modal and reactive notebooks such as Marimo empowers biotech data scientists for rapid deployment without heavy infrastructure overhead.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort is in Q1, 2206. Here is a 35% discount code for readers. 👈https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgch👉 Eric & Hugo have a free upcoming livestream workshop: Building Tools for Thinking with AI (register to join live or get the recording afterwards) 👈Show notes* Eric’s website* Eric Ma on LinkedIn* Eric’s blog* Eric’s data science newsletter* Building Effective AI Agents by the Anthropic team* Wow, Marimo from Eric’s blog* Wow, Modal from Eric’s blog* Upcoming Events on Luma* Watch the podcast video on YouTube* Join the final cohort of our Building AI Applications course in Q1, 2026 (35% off for listeners)Timestamps00:00 Defining Agents and Workflows02:04 Challenges in Regulated Environments04:24 Eric Ma's Role at Moderna, Leading Research Data Science in the Data Science and AI Group12:37 Document Reformatting and Automation15:42 Data Security and Permission Mapping20:05 Choosing the Right Model for Production20:41 Evaluating Model Changes with Benchmarks23:10 Vibe-Based Evaluation vs. Formal Testing27:22 Security and Fine-Tuning in LLMs28:45 Challenges and Future of Fine-Tuning34:00 Security Layers and Information Leakage37:48 Wrap-Up and Final Remarks👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort is in Q1, 2026. Here is a 35% discount code for readers. 👈https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgch Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
-
Episode 65: The Rise of Agentic Search 19.12.2025 51minWe’re really moving from a world where humans are authoring search queries and humans are executing those queries and humans are digesting the results to a world where AI is doing that for us.Jeff Huber, CEO and co-founder of Chroma, joins Hugo to talk about how agentic search and retrieval are changing the very nature of search and software for builders and users alike.We Discuss:* “Context engineering”, the strategic design and engineering of what context gets fed to the LLM (data, tools, memory, and more), which is now essential for building reliable, agentic AI systems;* Why simply stuffing large context windows is no longer feasible due to “context rot” as AI applications become more goal-oriented and capable of multi-step tasks* A framework for precisely curating and providing only the most relevant, high-precision information to ensure accurate and dependable AI systems;* The “agent harness”, the collection of tools and capabilities an agent can access, and how to construct these advanced systems;* Emerging best practices for builders, including hybrid search as a robust default, creating “golden datasets” for evaluation, and leveraging sub-agents to break down complex tasks* The major unsolved challenge of agent evaluation, emphasizing a shift towards iterative, data-centric approaches.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort is in Q1, 2206. Here is a 35% discount code for readers. 👈Oh! One more thing: we’ve just announced a Vanishing Gradients livestream for January 21 that you may dig:* A Builder’s Guide to Agentic Search & Retrieval with Doug Turnbull and John Berryman (register to join live or get the recording afterwards.Show notes* Jeff Huber on Twitter* Jeff Huber on LinkedIn* Try Chroma!* Context Rot: How Increasing Input Tokens Impacts LLM Performance by The Chroma Team* AI Agent Harness, 3 Principles for Context Engineering, and the Bitter Lesson Revisited* From Context Engineering to AI Agent Harnesses: The New Software Discipline* Generative Benchmarking by The Chroma Team* Effective context engineering for AI agents by The Anthropic Team* Making Sense of Millions of Conversations for AI Agents by Ivan Leo (Manus) and Hugo* How we built our multi-agent research system by The Anthropic Team* Upcoming Events on Luma* Watch the podcast video on YouTube👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort is in Q1, 2206. Here is a 35% discount code for readers. 👈https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgch Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
-
Episode 64: Data Science Meets Agentic AI with Michael Kennedy (Talk Python) 03.12.2025 1h 2minWe have been sold a story of complexity. Michael Kennedy (Talk Python) argues we can escape this by relentlessly focusing on the problem at hand, reducing costs by orders of magnitude in software, data, and AI.In this episode, Michael joins Hugo to dig into the practical side of running Python systems at scale. They connect these ideas to the data science workflow, exploring which software engineering practices allow AI teams to ship faster and with more confidence. They also detail how to deploy systems without unnecessary complexity and how Agentic AI is fundamentally reshaping development workflows.We talk through:- Escaping complexity hell to reduce costs and gain autonomy- The specific software practices, like the "Docker Barrier", that matter most for data scientists- How to replace complex cloud services with a simple, robust $30/month stack- The shift from writing code to "systems thinking" in the age of Agentic AI- How to manage the people-pleasing psychology of AI agents to prevent broken code- Why struggle is still essential for learning, even when AI can do the work for youLINKSTalk Python In Production, the Book! (https://talkpython.fm/books/python-in-production)Just Enough Python for Data Scientists Course (https://training.talkpython.fm/courses/just-enough-python-for-data-scientists)Agentic AI Programming for Python Course (https://training.talkpython.fm/courses/agentic-ai-programming-for-python)Talk Python To Me (https://talkpython.fm/) and a recent episode with Hugo as guest: Building Data Science with Foundation LLM Models (https://talkpython.fm/episodes/show/526/building-data-science-with-foundation-llm-models)Python Bytes podcast (https://pythonbytes.fm/)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtube.com/live/jfSRxxO3aRo?feature=share)Join the final cohort of our Building AI Applications course starting Jan 12, 2026 (35% off for listeners) (https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav): https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
-
Episode 63: Why Gemini 3 Will Change How You Build AI Agents with Ravin Kumar (Google DeepMind) 22.11.2025 1hGemini 3 is a few days old and the massive leap in performance and model reasoning has big implications for builders: as models begin to self-heal, builders are literally tearing out the functionality they built just months ago... ripping out the defensive coding and reshipping their agent harnesses entirely.Ravin Kumar (Google DeepMind) joins Hugo to breaks down exactly why the rapid evolution of models like Gemini 3 is changing how we build software. They detail the shift from simple tool calling to building reliable "Agent Harnesses", explore the architectural tradeoffs between deterministic workflows and high-agency systems, the nuance of preventing context rot in massive windows, and why proper evaluation infrastructure is the only way to manage the chaos of autonomous loops.They talk through:- The implications of models that can "self-heal" and fix their own code- The two cultures of agents: LLM workflows with a few tools versus when you should unleash high-agency, autonomous systems.- Inside NotebookLM: moving from prototypes to viral production features like Audio Overviews- Why Needle in a Haystack benchmarks often fail to predict real-world performance- How to build agent harnesses that turn model capabilities into product velocity- The shift from measuring latency to managing time-to-compute for reasoning tasksLINKSFrom Context Engineering to AI Agent Harnesses: The New Software Discipline, a podcast Hugo did with Lance Martin, LangChain (https://high-signal.delphina.ai/episode/context-engineering-to-ai-agent-harnesses-the-new-software-discipline)Context Rot: How Increasing Input Tokens Impacts LLM Performance (https://research.trychroma.com/context-rot)Effective context engineering for AI agents by Anthropic (https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtu.be/CloimQsQuJM)Join the final cohort of our Building AI Applications course starting Jan 12, 2026 (https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav): https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
-
Episode 62: Practical AI at Work: How Execs and Developers Can Actually Use LLMs 31.10.2025 59minMany leaders are trapped between chasing ambitious, ill-defined AI projects and the paralysis of not knowing where to start. Dr. Randall Olson argues that the real opportunity isn't in moonshots, but in the "trillions of dollars of business value" available right now. As co-founder of Wyrd Studios, he bridges the gap between data science, AI engineering, and executive strategy to deliver a practical framework for execution.In this episode, Randy and Hugo lay out how to find and solve what might be considered "boring but valuable" problems, like an EdTech company automating 20% of its support tickets with a simple retrieval bot instead of a complex AI tutor. They discuss how to move incrementally along the "agentic spectrum" and why treating AI evaluation with the same rigor as software engineering is non-negotiable for building a disciplined, high-impact AI strategy.They talk through:How a non-technical leader can prototype a complex insurance claim classifier using just photos and a ChatGPT subscription.The agentic spectrum: Why you should start by automating meeting summaries before attempting to build fully autonomous agents.The practical first step for any executive: Building a personal knowledge base with meeting transcripts and strategy docs to get tailored AI advice.Why treating AI evaluation with the same rigor as unit testing is essential for shipping reliable products.The organizational shift required to unlock long-term AI gains, even if it means a short-term productivity dip.LINKSRandy on LinkedIn (https://www.zenml.io/llmops-database)Wyrd Studios (https://thewyrdstudios.com/)Stop Building AI Agents (https://www.decodingai.com/p/stop-building-ai-agents)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtu.be/-YQjKH3wRvc)🎓 Learn more:In Hugo's course: Building AI Applications for Data Scientists and Software Engineers (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20) — https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20 Next cohort starts November 3: come build with us! Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
-
Episode 61: The AI Agent Reliability Cliff: What Happens When Tools Fail in Production 16.10.2025 28minMost AI teams find their multi-agent systems devolving into chaos, but ML Engineer Alex Strick van Linschoten argues they are ignoring the production reality. In this episode, he draws on insights from the LLM Ops Database (750+ real-world deployments then; now nearly 1,000!) to systematically measure and engineer constraint, turning unreliable prototypes into robust, enterprise-ready AI.Drawing from his work at Zen ML, Alex details why success requires scaling down and enforcing MLOps discipline to navigate the unpredictable "Agent Reliability Cliff". He provides the essential architectural shifts, evaluation hygiene techniques, and practical steps needed to move beyond guesswork and build scalable, trustworthy AI products.We talk through:- Why "shoving a thousand agents" into an app is the fastest route to unmanageable chaos- The essential MLOps hygiene (tracing and continuous evals) that most teams skip- The optimal (and very low) limit for the number of tools an agent can reliably use- How to use human-in-the-loop strategies to manage the risk of autonomous failure in high-sensitivity domains- The principle of using simple Python/RegEx before resorting to costly LLM judgesLINKSThe LLMOps Database: 925 entries as of today....submit a use case to help it get to 1K! (https://www.zenml.io/llmops-database)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtu.be/-YQjKH3wRvc)🎓 Learn more:-This was a guest Q&A from Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20) — https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20 Next cohort starts November 3: come build with us! Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
-
Episode 60: 10 Things I Hate About AI Evals with Hamel Husain 30.09.2025 1h 13minMost AI teams find "evals" frustrating, but ML Engineer Hamel Husain argues they’re just using the wrong playbook. In this episode, he lays out a data-centric approach to systematically measure and improve AI, turning unreliable prototypes into robust, production-ready systems.Drawing from his experience getting countless teams unstuck, Hamel explains why the solution requires a "revenge of the data scientists." He details the essential mindset shifts, error analysis techniques, and practical steps needed to move beyond guesswork and build AI products you can actually trust.We talk through: The 10(+1) critical mistakes that cause teams to waste time on evals Why "hallucination scores" are a waste of time (and what to measure instead) The manual review process that finds major issues in hours, not weeks A step-by-step method for building LLM judges you can actually trust How to use domain experts without getting stuck in endless review committees Guest Bryan Bischof's "Failure as a Funnel" for debugging complex AI agentsIf you're tired of ambiguous "vibe checks" and want a clear process that delivers real improvement, this episode provides the definitive roadmap.LINKSHamel's website and blog (https://hamel.dev/)Hugo speaks with Philip Carter (Honeycomb) about aligning your LLM-as-a-judge with your domain expertise (https://vanishinggradients.fireside.fm/51)Hamel Husain on Lenny's pocast, which includes a live demo of error analysis (https://www.lennysnewsletter.com/p/why-ai-evals-are-the-hottest-new-skill)The episode of VG in which Hamel and Hugo talk about Hamel's "data consulting in Vegas" era (https://vanishinggradients.fireside.fm/9)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtube.com/live/QEk-XwrkqhI?feature=share)Hamel's AI evals course, which he teaches with Shreya Shankar (UC Berkeley): starts Oct 6 and this link gives 35% off! (https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME) https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME🎓 Learn more:Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
-
Episode 59: Patterns and Anti-Patterns For Building with AI 23.09.2025 47minJohn Berryman (Arcturus Labs; early GitHub Copilot engineer; co-author of Relevant Search and Prompt Engineering for LLMs) has spent years figuring out what makes AI applications actually work in production. In this episode, he shares the “seven deadly sins” of LLM development — and the practical fixes that keep projects from stalling. From context management to retrieval debugging, John explains the patterns he’s seen succeed, the mistakes to avoid, and why it helps to think of an LLM as an “AI intern” rather than an all-knowing oracle. We talk through: - Why chasing perfect accuracy is a dead end - How to use agents without losing control - Context engineering: fitting the right information in the window - Starting simple instead of over-orchestrating - Separating retrieval from generation in RAG - Splitting complex extractions into smaller checks - Knowing when frameworks help — and when they slow you down A practical guide to avoiding the common traps of LLM development and building systems that actually hold up in production.LINKS:Context Engineering for AI Agents, a free, upcoming lightning lesson from John and Hugo (https://maven.com/p/4485aa/context-engineering-for-ai-agents)The Hidden Simplicity of GenAI Systems, a previous lightning lesson from John and Hugo (https://maven.com/p/a8195d/the-hidden-simplicity-of-gen-ai-systems)Roaming RAG – RAG without the Vector Database, by John (https://arcturus-labs.com/blog/2024/11/21/roaming-rag--rag-without-the-vector-database/)Cut the Chit-Chat with Artifacts, by John (https://arcturus-labs.com/blog/2024/11/11/cut-the-chit-chat-with-artifacts/)Prompt Engineering for LLMs by John and Albert Ziegler (https://amzn.to/4gChsFf)Relevant Search by John and Doug Turnbull (https://amzn.to/3TXmDHk)Arcturus Labs (https://arcturus-labs.com/)Watch the podcast on YouTube (https://youtu.be/mKTQGKIUq8M)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)🎓 Learn more:Hugo's course (this episode was a guest Q&A from the course): Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
Popular în
Acest podcast apare și în topurile de podcasturi din aceste țări.