Vanishing Gradients

Hugo Bowne-Anderson

Țara Statele Unite

Genuri Tehnologie, Știință

Limba EN

Episoade 78

Ultimul 25.07.2026

A podcast for people who build with AI. Long-format conversations with people shaping the field about agents, evals, multimodal systems, data infrastructure, and the tools behind them. Guests include Jeremy Howard (fast.ai), Hamel Husain (Parlance Labs), Shreya Shankar (UC Berkeley), Wes McKinney (creator of pandas), Samuel Colvin (Pydantic) and more.

Episoade

Four Months Inside a Production AI Agent: What Real Users Changed 25.07.2026 1h 4min

When ML/AI Engineer William Horton last joined me, Maven Assistant had reached its first external users the day before. The healthcare AI agent was available to 20 percent of Maven Clinic’s users, and the team had deliberately withheld answers about benefits. A wrong response could shape a decision involving $15,000 of fertility coverage, and the evals had not earned the right to ship it.Four months later, Maven Assistant is available to 100 percent of users, benefits answering is live, and weekly conversation volume has grown by roughly ten times. Real usage also overturned part of the roadmap. The team had invested heavily in provider search and appointment tools, but 50 to 60 percent of early conversations were basic health questions such as whether someone could eat tuna while pregnant.Production changed the engineering system too. An emergency guardrail told someone already in the ER to go to the ER. Zendesk content told people already using the Maven app to open the app. A newer model failed an upcoming-appointments eval because it correctly noticed that the mocked appointments were in the past.William explains how Maven turns those failures into deterministic tests, LLM judges, synthetic negatives, and manual review. He also walks through the move from Gemini Flash models toward newer OpenAI models, what GPT-5.6 and Fable mean for a production agent, why model upgrades can make old prompt instructions obsolete, and why open-weight models still have to justify their GPU, infrastructure, and engineering costs.“If anybody tells you that they’ve got their evaluations so good that they can just swap a model and know, with no manual review, that it’s going to be better, that person is probably lying, or they work at one of three places in the world.”— William Horton, Staff Machine Learning Engineer, Maven ClinicYou can also find the full episode on Spotify, Apple Podcasts, and YouTube.👉 Want to build agents from the ground up? Registration is open for Build AI Agents from First Principles, a live workshop on the loops, tools, context, harnesses, and engineering decisions behind useful AI agents. You’ll learn how to design agent systems from first principles, with enough structure to decide which harness patterns your product actually needs. Sign up today with code production10 for 10% off. 👈In This Episode* What changed between 20 percent and 100 percent rollout. Benefits answering cleared its release bar, Maven Assistant reached the remaining users, and weekly conversation volume grew by roughly ten times.* Why real usage beat the original roadmap. The complex provider and appointment agents received less traffic than expected, while 50 to 60 percent of early conversations were basic health questions.* How production failures enter the evaluation system. William estimates that the deterministic tool-use layer now involves more than 1,000 test scenarios, while clinical quality, empathy, and completeness still require judges, human calibration, and manual work.* When the model is right and the eval is broken. GPT-5.6 Terra rejected an “upcoming” appointment that had already happened, exposing a bad test fixture that the previous model had accepted.* What happens when GPT-5.6 or Fable arrives. A model swap can remove old prompt instructions, add new behavioral failures, change latency and cost, or reveal that yesterday’s harness is constraining a more capable model.* Why Maven moved beyond Gemini Flash 2.5. William discusses adopting newer OpenAI models, keeping real-time chat on smaller models and low reasoning settings, and changing the model without simultaneously rewriting the prompt.* The economics of open-weight models. A smaller self-hosted model still needs an always-on GPU, infrastructure, and engineering attention that could otherwise go into the product.* What William would rebuild today. Provider search and appointment booking probably belong in one agent, and model experiments should begin before the original choice hardens into the architecture.Start With the First EpisodeWilliam first joined Vanishing Gradients the day after Maven Assistant reached external users. In Building an Enterprise AI Agent for Healthcare, he explains the original architecture, how failures become regression cases, why deterministic checks should come before LLM judges, and how the consequence of a wrong answer sets the release bar.Resources* Maven Clinic* Building an Enterprise AI Agent for Healthcare* Stop Overengineering Your Agent Harness* Build AI Agents from First Principles* All Vanishing Gradients workshops👉 Want to build agents from the ground up? Registration is open for Build AI Agents from First Principles, a live workshop on the loops, tools, context, harnesses, and engineering decisions behind useful AI agents. You’ll learn how to design agent systems from first principles, with enough structure to decide which harness patterns your product actually needs. Sign up today with code production10 for 10% off. 👈How You Can Support Vanishing GradientsVanishing Gradients is a podcast, workshop series, blog, and newsletter focused on what you can build with AI right now. More than 70 episodes with expert practitioners from Google DeepMind, Netflix, Stanford, and elsewhere. Hundreds of hours of free, hands-on workshops. All independent, all free.If you want to help keep it going:* Become a paid subscriber, from $8 per month* Share this episode with a builder who would find it useful* Subscribe to the Vanishing Gradients YouTube channel* Join another Vanishing Gradients workshop Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
Building an Enterprise AI Agent for Healthcare 17.07.2026 1h 8min

Every capability in an agent needs its own evidence and release bar. A model-provider slip, an incorrect tool call, and a wrong fertility-benefits answer should not be held to the same pass rate.William Horton, Staff AI Engineer at Maven Clinic, joined us the day after Maven Assistant reached its first external users. The agent helps members inside Maven Clinic’s women’s and family healthcare platform find providers, manage appointments, navigate Maven, and get basic health information. William had spent much of launch day reading chat traces and turning the surprises into product decisions and tests.William shows how a production failure moves through Maven’s system: the trace becomes a regression case, code handles deterministic checks, and LLM judges cover behavior that cannot be reduced to exact outputs. Human labels calibrate those judges, while the consequence of a wrong answer determines whether the capability ships. You can apply the same release workflow to the agent you are building now.“For a lot of our tool-call evaluation, I’ll accept that it runs ten times and passes nine times. Going for that ten out of ten is just not worth the effort.”— William Horton, Staff AI Engineer, Maven ClinicYou can also find the full episode on Spotify, Apple Podcasts, and YouTube.👉 Want to build agents from the ground up? Registration is open for Build AI Agents from First Principles, a live workshop on the loops, tools, context, harnesses, and engineering decisions behind useful AI agents. You'll learn how to design agent systems from first principles, with enough structure to decide which harness patterns your product actually needs. Sign up today with vg-code for 10% off 👈In This Episode* The architecture behind Maven Assistant. A stronger lead agent routes requests to four narrower specialists for appointments, provider search, health questions, and Maven support. Hard guardrails run before dynamic routing.* Why an enterprise healthcare assistant only needs 15 to 20 tools. Maven divides a manageable toolset across its specialists instead of exposing one model to hundreds of choices. Existing APIs become safer agent tools, with user identity and other application state injected by code.* Turn failures into the cheapest reliable eval. A response claiming the agent was “made by Google” became a string check, tool calls are verified deterministically, and LLM judges handle clinical accuracy and other qualitative behavior.* Set release thresholds from the consequences. Nine passes in ten can be acceptable for a cheap failure. Maven withheld benefits answers that could influence tens of thousands of dollars and routes self-harm language directly to human support.* Let production change the product and the test set. Early chats changed the roadmap, became regression cases, exposed weaknesses in the judges, and supplied realistic opening messages for simulated users.Join the Four-Month Follow-UpThis episode was recorded live inside our Building AI Applications course the day after Maven Assistant reached its first external users. By the follow-up four months later, Maven will have a much larger body of real conversations. William will return to compare the launch assumptions with what members actually used, which evals changed, and how newer models altered the system.Register to join the livestream or receive the recording afterwards.Resources* Maven Clinic* Maven introduces Maven Intelligence* Google Agent Development Kit👉 Want to build agents from the ground up? Registration is open for Build AI Agents from First Principles, a live workshop on the loops, tools, context, harnesses, and engineering decisions behind useful AI agents. You'll learn how to design agent systems from first principles, with enough structure to decide which harness patterns your product actually needs. Sign up today with vg-code for 10% off 👈How You Can Support Vanishing GradientsVanishing Gradients is a podcast, workshop series, blog, and newsletter focused on what you can build with AI right now. Over 70 episodes with expert practitioners from Google DeepMind, Netflix, Stanford, and elsewhere. Hundreds of hours of free, hands-on workshops. All independent, all free.If you want to help keep it going:* Become a paid subscriber, from $8/month* Share this with a builder who’d find it useful* Subscribe to our YouTube channel* Join one of our other workshops here Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
What Claude Fable Means for Coding Agents 08.07.2026 1h 2min

Nicolay Gerold works all day and night on AMP, one of the most interesting coding-agent harnesses out there.If you’re building with coding agents, this conversation will help you understand: * when to trust the model, * when to build harnesses around it,* which model is worth paying for, * which programming languages gives the agent better feedback, and * when to take the keyboard back.Coding-agent products are living inside a blender. Opus 4.8 to Fable changes what the model can be trusted with, eats a workflow, and suddenly the best product decision is to delete code.AMP had handoff because long agent threads used to get messy. Compaction would lose the plot, the model would make worse decisions, and the product needed a way to move the work somewhere cleaner. Then compaction got better. The model ate the feature. AMP killed it.Builders inherit the annoying product test: does this harness code help inspect, verify, recover, or merge model work, or is it just babysitting yesterday’s model?Nico and Hugo riff on why loop engineering is overrated (and when to use it), why Fable is the first model with real engineering taste, and why you should stop writing Python code today and start writing TypeScript and Rust for all your AI Engineering workflows.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.👉 Want to build agents from the ground up? Registration is open for Build AI Agents from First Principles, a live workshop on the loops, tools, context, harnesses, and engineering decisions behind useful AI agents. You'll learn how to design agent systems from first principles, with enough structure to decide which harness patterns your product actually needs. Sign up today with vg-code for 10% off 👈In This Episode* Coding-agent harnesses today: compaction, sandboxes, review flows, and the features frontier models are starting to absorb.* Why AMP keeps deleting its own features when models get better.* The test for every harness feature: does it make the agent’s work easier to inspect, verify, or recover from?* Local agents, cloud sandboxes, and where each fits when bugs, issues, logs, or customer feedback turn into code changes.* Background agents without auto-merge fantasy: how useful work comes back as branches, checkouts, or review candidates.* Loop engineering in practice: tight loops with clear objectives, broad loops that create review overload, and where builders should draw the line.* When deterministic code beats an AI step, and when a single agent with the right tools can replace brittle orchestration.* The TikTok problem for coding: hundreds of agent threads, fragmented attention, and why loop engineering can become a trap.- The TikTok problem for coding: hundreds of agent threads, fragmented attention, and why loop engineering can become a trap.Resources* AMP* AMP Owner’s Manual* Nicolay Gerold’s Show Us Your Agent Skills dossier* Clio: Privacy-Preserving Insights into Real-World AI Use* TigerBeetle TigerStyle* How to Build A Coding Agent with Nico and Hugo Build AI Agents From First Principles👉 Want to build agents from the ground up? Registration is open for Build AI Agents from First Principles, a live workshop on the loops, tools, context, harnesses, and engineering decisions behind useful AI agents. You’ll learn how to design agent systems from first principles, with enough structure to decide which harness patterns your product actually needs. Sign up today with vg-code for 10% off. 👈How You Can Support Vanishing GradientsVanishing Gradients is a podcast, workshop series, blog, and newsletter focused on what you can build with AI right now. Over 70 episodes with expert practitioners from Google DeepMind, Netflix, Stanford, and elsewhere. Hundreds of hours of free, hands-on workshops. All independent, all free.If you want to help keep it going:* Become a paid subscriber, from $8/month* Share this with a builder who’d find it useful* Subscribe to our YouTube channel* Join one of our other workshops here Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
The Future of Agentic Data Science 25.05.2026 1h 4min

So I think we’re really at a historical moment, and the opportunity is massive. Almost 15 years ago, we were promised that data science was going to be this incredible thing and create all this value for people. And I think nowadays it’s mostly viewed as a cost center in most companies. I think we can really now fulfill that original promise with agentic data science. Thomas Wiecki, Co-creator of PyMC and Founder at PyMC Labs, joins Hugo to talk about how agentic data science is finally fulfilling the promise of Decision Intelligence.We Discuss:* Decision Engines: Transform data science from a cost center providing cryptic answers into a real-time decision intelligence hub delivering actionable outcomes;* Tame the “Garden of Forking Paths”: Overcome human shortcuts by running parallel analyses to provide an honesty check, revealing the true robustness of business conclusions;* Multiplayer Data Science: Foster organizational learning by moving agents into team chats, democratizing “what-if” questions and reducing context-switching friction;* The Full Agentic Data Science Stack: Beyond harness and skills, the full stack includes orchestration for parallel analyses and a causal eval layer to measure actual outcome improvement;* Agentic Dashboards: Move beyond static BI; use chat interfaces to inquire into models and generate real-time, custom visualizations for specific follow-up questions;* Encode Professional Judgment as Skills: Elevate agent performance by encoding expert domain standards and high-fidelity workflows into specific Agent Skills, rather than relying on LLM pre-training;* Ground Decisions in Generative Processes: Prevent hallucinations by forcing agents to model underlying physical or behavioral processes, providing a programmatic guardrail aligned with market realities;* Scripted Causal-Bayesian Workflows: Their methodologically structured nature—from prior elicitation to posterior predictive checks—makes Causal-Bayesian workflows inherently automatable for agents;* Iterative Autonomy via Skills: Achieve autonomy iteratively: verify workflows with human oversight, then encode verifiable parts as skills to hand off trusted tasks;You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉Want to learn how to apply agentic engineering to the world of data science? Come build the future of Agentic Data Science with us in our upcoming course. It’s a live cohort with hands on exercises, capstones, and reusable agent skills, OSS code, and notebooks that will 10x your data science projects. Sign up here and use the code ADSVG10 for 10% off. Hit reply to enquire about group discounts.👈LINKS* Thomas Wiecki on LinkedIn* PyMC Labs* Open-Sourcing Decision Lab: Scaling AI Judgment in Data Science (PyMC Labs blog)* Decision AI Discord* Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results (Sage Journals)* The Agent Harness Reading List* Show Us Your Agent Skills (GitHub)* Agentic Data Science course with Hugo, Thomas, and Luca (10% off with code ADSVG10)* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTube👉Want to learn how to apply agentic engineering to the world of data science? Come build the future of Agentic Data Science with us in our upcoming course. It’s a live cohort with hands on exercises, capstones, and reusable agent skills, OSS code, and notebooks that will 10x your data science projects. Sign up here and use the code ADSVG10 for 10% off. Hit reply to enquire about group discounts👈 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
Agent-Harness.ipynb* 20.05.2026 1h 19min

One thing that I don’t like about Claude is that you get into this weird mental state: oh, I think I trust the model. Let’s do the slot machine. Hit click, which puts you in an inactive mode of thinking. Maybe it’s better to use a worse model….Vincent Warmerdam, senior data professional and prolific open-source maintainer (some packages with over a million downloads), now Engineer at marimo, joins Hugo to talk about how the Python notebook is evolving from a static scratchpad into a working agent harness, and what it takes to stay in the loop as a developer when agents are writing most of the code. This episode was originally a livestream Q&A with the Vanishing Gradients audience.We Discuss:* Shared Notebook Canvas: Notebooks act as a shared memory space where agents and humans co-exist, enabling real-time visual feedback by direct manipulation of global state and UI elements;* Speed-of-Thought Models: Faster, open-weight models like Kimi K2 enhance exploratory flow by keeping humans more alert to the code, unlike frontier models that can induce passive thinking;* Pi as a Harness: Vincent favors an agent harness where agents extend themselves rather than reach for MCP, and where hooks can rigidly constrain which files an agent is allowed to read or touch;* Why PRDs Don’t Fit Notebooks: Notebook work is fundamentally exploratory, so the discipline that works for shipping web apps does not transfer cleanly; the one exception is reproducing a paper;* Interactive Code Review: Interactive UIs (e.g., dragging integers) transform code into a physical object, incentivizing developers to actively review and understand agent logic;* Modular “Lego” Components: Provide agents with high-level, well-tested components (”Lego” code) instead of raw boilerplate, creating systems that are easier to debug and modulate;* Algorithm-Driven Visualization: Let the algorithm dictate the visualization needed, rather than choosing visualizations first, revealing the most interesting structures within the data;* Don’t Outsource the Thinking: Pen and paper architectural planning, walks away from the keyboard, and protecting calm remain the most effective ways to keep producing good ideas in the age of AI-generated software.* Agent Auto-Healing: A marimo-specific linter solved 60% of agent errors overnight by letting agents diagnose and fix their own “slop” without complex prompt engineering;* Incremental Generation: Avoid monolithic LLM outputs; generate code one to two cells at a time to prevent laziness and ensure human oversight and learning;Vincent closes on the idea that calm, not the latest frontier model, is the most underrated tool for building well, and that we should study LLM output the way chess players studied the engines that beat them.Vincent gives several live demos toward the end of the episode. He describes them well enough to follow on audio, but the visuals are worth seeing, so check out the YouTube version here.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn how to apply agentic engineering to the world of data science? Come build the future of Agentic Data Science with us in our upcoming course. It’s a live cohort with hands on exercises, capstones, and reusable agent skills, OSS code, and notebooks that will 10x your data science projects. Sign up here and use the code ADSVG10 for 10% off.👈Also join us for Ep. 3 of Show Us Your Agent Skills: with Vincent, Paul Iusztin (Decoding AI), Eleanor Berger (Elite AI-Assisted Coding), Alan Nichol (Rasa), Nico Gerold (amp), and Matthew Honnibal (spaCy, Explosion). Register on lu.ma to join live, or catch the recording afterwards.LINKS* Vincent Warmerdam on LinkedIn* Vincent’s website (koaning.io)* Wiggly Stuff — Vincent’s widget library* Marimo Gallery* skills.sh* Armin Ronacher on Pi (the minimal agent inside open claw)* Building Agents That Build Themselves — Hugo’s workshop write-up with Ivan Leo* Data Science Fiction: Winning at Metrics, Losing at AI Evals — Hugo’s blog post based on Vincent’s talk* Isaac Flath’s project (on X)* Braid (video game)* Hugo’s earlier podcast with Akshay (marimo)* Elite AI Assisted Coding — Eleanor Berger’s course (Vanishing Gradients community gets 25% off with code “HUGO”)* GameMakers Toolkit (YouTube)* Upcoming Events on Luma* Vanishing Gradients on YouTube* Come build the future of Agentic Data Science with us in our upcoming course (10% off) .How You Can Support Vanishing GradientsVanishing Gradients is a podcast, workshop series, blog, and newsletter focused on what you can build with AI right now. Over 70 episodes with expert practitioners from Google DeepMind, Netflix, Stanford, and elsewhere. Hundreds of hours of free, hands-on workshops. All independent, all free.If you want to help keep it going:* Become a paid subscriber, from $8/month* Share this with a builder who’d find it useful* Subscribe to our YouTube channel. Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
Agentic Engineering and the Lost Art of Verification 12.05.2026 1h 32min

> I almost don’t read code now. My approach with Roborev is it’s like my code reader. The mantra is: Roborev reads every line of code that is generated. It gets read multiple times. And so, whenever I push up a pull request, the branch gets re-reviewed. And so by the time I’m merging a pull request into a repository, the code has all been read by agents four or five times minimum. I look at the code in terms of structural detail: does it look right?— Wes McKinney (creator of pandas, POSIT)Wes, Jeremiah Lowin (Prefect), and Randy Olson (Good Eye Labs) join Hugo and his cohost Thomas Wiecki (PyMC Labs) for the premiere of Show Us Your Agent Skills, a live session where guests walk us through the exact skills, workflows, and setups they use to work with agents every day.We Discuss:* Wes McKinney on why he barely writes, or even reads, code anymore, his “software factory” of parallel agents, and RoboRev, the background reviewer that reads every line four or five times before he merges;* The shift from “vibe coding” to agentic engineering, and why verification, not reading, is the part that actually matters;* Jeremiah Lowin on years of context engineering: trickling voice memos, recorded meetings, and morning briefs into his agent’s memory substrate as a true “second brain”;* Why Jeremiah picked OpenCode specifically for how deeply he can customize its memory, and what he’s building with FastMCP, Prefab, and Cardboard;* Randy Olson on encoding human judgment, like Tufte’s rules for data visualization, directly into agent skills, so the agents themselves perform the verification;* The “digital twin” Randy loads into his agents as a thought partner that pushes back instead of agreeing;* Skills as thin drivers, progressive disclosure, and managing context rot across extended sessions;* The rise of ephemeral, “just for me” software that agents finally make viable.Skills and workflows discussed and shown in the episode:* Wes’s RoboRev background code reviewer, his “software factory” dashboard, and his agentic engineering setup built on the Superpowers skills framework;* Jeremiah’s “explain” skill (which anchors every other skill he has), his voice memo memory pipeline, his FastMCP and Prefab projects, and Cardboard, his ephemeral presentation tool;* Randy’s data visualization verifier skills, his digital twin thought partner prompt, his cron job reports for colleagues, and his reflect and improve skill design pattern.Check out the GitHub repo where we’re starting to drop some of these skills and workflows for you to grab and try yourself.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!Up next on Show Us Your Agent Skills: Hilary Mason (CEO, HiddenDoor), Bryan Bischof (Theory Ventures), Eric Ma (Research DS lead, Moderna Therapeutics), and Tomasz Tunguz (Theory Ventures). Register on lu.ma to join live, or catch the recording afterwards.👉 Want to learn how to apply agentic engineering to the world of data science? Come build the future of Agentic Data Science with us in our upcoming course. It’s a live cohort with hands on exercises, capstones, and reusable agent skills, OSS code, and notebooks that will 10x your data science projects. Sign up here.👈LINKS* spicytakes.org, Wes McKinney’s website* RoboRev, Wes’s background code reviewer* Agents View, Wes’s agent session database* Middleman, Wes’s local GitHub dashboard* Superpowers, Jesse Vincent’s skills framework that Wes builds on* An Open Source Maintainer’s Guide to Saying No, by Jeremiah Lowin* FastMCP* Prefab, Jeremiah’s Python DSL for generative UIs* Beautiful Charts with AI, by Randy Olson* The Coding Agent is Dead, by Amp* Building Effective Agents, by the Anthropic team* Show Us Your Agent Skills, the GitHub repo where we are dropping skills and workflows from the show* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTube* Come build the future of Agentic Data Science with us in our upcoming course.How You Can Support Vanishing GradientsVanishing Gradients is a podcast, workshop series, blog, and newsletter focused on what you can build with AI right now. Over 70 episodes with expert practitioners from Google DeepMind, Netflix, Stanford, and elsewhere. Hundreds of hours of free, hands-on workshops. All independent, all free.If you want to help keep it going:* Become a paid subscriber, from $8/month* Share this with a builder who’d find it useful* Subscribe to our YouTube channel. Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
Next Level AI Evals for 2026 23.04.2026 53min

There are a lot of reasons why we should do AI evals. For many companies doing AI evals is the way to build the feedback loop into the product development lifecycle. So it is like your compass. We’re using AI evals as a compass to guide product development and also product iteration. And also, many times we need evals to function as the pass or fail gate in release decisions. Whether this product is good enough for release or whether it is good enough for experiment, evals are also used in that.Stella Wenxing Liu, Head of Applied Science at ASU, and Eddie Landesberg, Staff Data Scientist at Google, join Hugo to talk about why AI evaluation is evolving from “vibe checks” into a rigorous, multi-disciplinary science and how causal inference will take AI evals to the next level in 2026.Vanishing Gradients is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.They Discuss:* Team-Centric AI Evals, integrating product managers, data scientists, and SMEs under a “benevolent dictator” (or not!) to ensure comprehensive and effective evaluation;* Custom Evaluation Metrics, moving beyond generic vendor metrics to analyze raw data and identify specific failure modes, avoiding generic product outcomes;* AI as Policy Evaluation, framing AI evaluation as a causal inference problem to estimate counterfactual performance of new “policies” (prompts, models) and predict online AB test outcomes;* Clear Product Constraints, defining what an AI product should not do with strict guardrails to prevent misuse, control costs, and avoid brand dilution;* Calibrated LLM Judges, statistically aligning LLM-as-a-judge with human experts using causal inference to ensure valid proxies for human welfare and business objectives;* Essential Data Curiosity, fostering a culture of manual data inspection to build intuition before relying on automated error analysis or agents, ensuring effective system design;* Statistical AI Evaluation, shifting from unit-test thinking to non-deterministic distributions, using confidence intervals and power analysis to discern genuine improvements from statistical noise;* Proactive Regulatory Compliance, developing rigorous, defensible internal evaluation standards now to gain a competitive advantage as vague AI regulations move towards enforced compliance;* Human-Centric Benchmarking, grounding AI systems in human judgment and user values, moving beyond automated scores to build resilient and differentiated AI.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Stella has just started teaching a cohort of her AI Evals and Analytics Playbook course starting this week. She’s kindly giving listeners of Vanishing Gradients 30% off with this link.👈Our flagship course Building AI Applications just wrapped its final cohort but we’re cooking up something new. If you want to be first to hear about it (and help shape what we build), drop your thoughts here.LINKS* Stella Wenxing Liu on LinkedIn* Eddie Landesberg on LinkedIn* Stella’s AI Evals & Analytics Playbook course on Maven (30% community discount)* CJE (Causal Judge Evaluation) package by Eddie* Trillion Dollar Coach* Goodhart’s Law* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTubeHow You Can Support Vanishing GradientsVanishing Gradients is a podcast, workshop series, blog, and newsletter focused on what you can build with AI right now. Over 70 episodes with expert practitioners from Google DeepMind, Netflix, Stanford, and elsewhere. Hundreds of hours of free, hands-on workshops. All independent, all free.If you want to help keep it going:* Become a paid subscriber, from $8/month* Share this with a builder who’d find it useful* Subscribe to our YouTube channel.Thanks for reading Vanishing Gradients! This post is public so feel free to share it. Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
Privacy Theater Is Not Privacy Engineering: What It Actually Takes to Ship Safe AI 15.04.2026 1h 6min

Katharine Jarmul, Privacy in ML/AI Expert & Author of Practical Data Privacy, joins Hugo to unpack why most AI privacy advice is theater: and what technical privacy actually looks like when you’re shipping LLMs, agents, and multimodal systems into the real world.In this episode, we dig into how to build defensible systems in an era of AI agents and multimodal models: why system prompts (and your entire agent harness!) should be considered public by default, and why “privacy observability” is as critical as data observability for anyone building with LLMs today. Multimodal is what changes the threat model: identifiers hide in images, audio, and metadata, not just text, and the old anonymization playbook doesn’t cover it.Vanishing Gradients is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.We Discuss:* No Convenience Tax, you don’t have to trade privacy for utility: high-utility AI products can be privacy-preserving through technical controls like privacy routing and input sanitization;* Public Prompts and Harnesses: assume any instruction or secret in a system prompt or agent harness will be exfiltrated; don’t put sensitive info there in the first place;* Privacy Observability, tag and track data flows so information is used only for its original intended purpose: catch design flaws before they become legal problems;* Technical Privacy, implement mathematical and statistical constraints directly into ML systems and data flows so privacy is measurable and enforceable, not aspirational;* Tiered Guardrails, a three-layer approach: deterministic filters for hard rules, algorithmic models for nuanced classification, and internal alignment training for behavioral baselines;* Federated Learning Is Not Privacy, model updates in FL leak sensitive data on their own: you must layer differential privacy or encrypted computation on top, or you’re reverse-engineerable;* Anonymization Spectrum, navigate the “grayscale” of privacy in multimodal AI, balancing data utility and individual risk as identifiers hide in non-obvious places;* Privacy Champions, embed privacy accountability directly into development by training and incentivizing engineers inside product teams;* Red Teaming as Ritual, your goal is to attack yourself: practice thinking like an attacker, and turn privacy testing into an organization-wide creative ritual rather than a siloed security task.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Katharine is teaching her next cohort of Practical AI Privacy starting April 20. She’s kindly giving readers of Vanishing Gradients 10% off. Use this link. I’ll be taking it so hope to see you there!👈Our flagship course Building AI Applications just wrapped its final cohort but we’re cooking up something new. If you want to be first to hear about it (and help shape what we build), drop your thoughts here.LINKS* Practical AI Privacy course on Maven (10% off with code build-with-privacy)* Katharine Jarmul on LinkedIn* Probably Private — Katharine’s website & newsletter* Practical Data Privacy (Katharine’s book)* Let’s Build an AI Privacy Router — Lightning Lesson* Practical AI Privacy: Agents & Local LLMs (newsletter issue)* A Deep Dive into Memorization in Deep Learning (kjamistan blog)* Microsoft Presidio* Llama Guard 3 8B on Hugging Face* Nicholas Carlini* From Magic to Malware: How OpenClaws Agent Skills Become an Attack Surface (1Password)* Owning Ethics (Metcalf, Moss, boyd — Data & Society)* Hugo on guardrails in LLM applications* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTubeHow You Can Support Vanishing GradientsVanishing Gradients is a podcast, workshop series, blog, and newsletter focused on what you can build with AI right now. Over 70 episodes with expert practitioners from Google DeepMind, Netflix, Stanford, and elsewhere. Hundreds of hours of free, hands-on workshops. All independent, all free.If you want to help keep it going:* Become a paid subscriber, from $8/month* Share this with a builder who’d find it useful* Subscribe to our YouTube channel.Thanks for reading Vanishing Gradients! This post is public so feel free to share it. Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
LLM Architecture in 2026: What You Need to Know with Sebastian Raschka 13.04.2026 1h 18min

If you take a model release as an anchor point, let’s say Nemotron 3 or Qwen 3.5, you can go in both directions: You can either plug them into an agent and play around with that, or you can look, okay, what does the model look like under the hood? What are the ingredients? What type of attention mechanism do they use? What are currently research techniques that could make that even better in the next generation of models? What can we swap out, basically? And I’m interested in both of these!Sebastian Raschka, Independent AI Researcher and author of Build a Large Language Model from Scratch, joins Hugo to talk about what’s changed in AI architecture, from post-training to hybrid models, and why understanding what’s under the hood matters more than ever for developers building in the agentic era. Sebastian’s upcoming book, Build a Reasoning Model from Scratch, currently available for pre-order on Amazon and in early access on Manning!Vanishing Gradients is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.We Discuss:* Ed Tech for Agents: should we design educational content specifically for agentic systems, or is there a better approach?* Inference Scaling is the new frontier, driving “gold-level” performance during generation via parallel sampling and internal meta-judges;* Hybrid Architectures from Qwen 3.5 and Nemotron 3 scale almost linearly, making long-context agentic workflows significantly more affordable and performant;* Multi-head Latent Attention (MLA), developed by DeepSeek, wins the KV cache war by drastically reducing memory overhead without performance hits;* Agent Harnesses need to be continuously simplified as frontier models are post-trained on agent trajectories. Teams that don’t strip back their scaffolding risk the harness getting in the way of a more capable model.* “AI Psychosis”: the cognitive load of supervising self-supervising agents, and why we’re all conducting an orchestra we were never trained to conduct;* Sebastian’s AI Stack: a surprisingly simple setup (Mac mini, Codex, Ollama) with a ~20-item QA checklist, delegating the boring work to preserve energy for creative development;* Fine-tuning is now an economic decision, optimizing costs and latency for high-volume tasks where long system prompts outweigh a one-time training run;* Process Reward Models (PRMs) are the next frontier, verifying intermediate reasoning steps to solve “hallucination in the middle” for complex math and code tasks;* “Implementation Does Not Lie”: Sebastian’s layer-by-layer verification philosophy, comparing from-scratch builds against HuggingFace references to catch details invisible in papers;* Architecture Details dictate inference stack choices; nuances like RMSNorm stability or RoPE flavors are critical for optimal performance and troubleshooting;* The Distillation Loop drives open-weight parity, enabling specialized, “frontier-class” models by “pre-digesting” frontier outputs without multi-million dollar training risks.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!Our flagship course Building AI Applications just wrapped its final cohort but we’re cooking up something new. If you want to be first to hear about it (and help shape what we build), drop your thoughts here.Links and Resources* Build a Reasoning Model (From Scratch): Sebastian’s new book, currently available for pre-order on Amazon and in early access on Manning. You’ll learn how reasoning LLMs actually work by starting with a pre-trained base LLM and adding reasoning capabilities step by step in code. A hands-on follow-up to Build a Large Language Model from Scratch.* LLM Architecture Gallery: Sebastian’s collection of architecture figures and fact sheets from his blog posts, updated with each major model release. A go-to visual reference for comparing what’s changed under the hood across model generations.* Sebastian Raschka on LinkedIn* Sebastian’s website* Ahead of AI (Sebastian’s Substack)* Build a Large Language Model from Scratch* PinchBench: OpenClaw Benchmark Leaderboard* DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning* Gated Delta Networks: Improving Mamba2 with Delta Rule (ICLR 2025)* DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning* Hugging Face Model Hub* Upcoming Events on Luma* Vanishing Gradients on YouTubeA Bit More on Agent Harnesses* Components of A Coding Agent by Sebastian* How To Build An Agent that Builds its own Harness by Hugo and Ivan Leo (DeepMind, ex-Manus)* Build Your Own Deep Research Agent with Hugo & Ivan Leo (Google DeepMind, ex-Manus): In this livestream, you’ll learn how to build a production-grade agent harness from scratch in pure Python;* AI Agent Harness, 3 Principles for Context Engineering, and the Bitter Lesson Revisited with Lance Martin (Anthropic), Duncan Gilchrist (Delphina), and Hugo* The Post-Coding Era: What Happens When AI Writes the System? with Nicholas Moy (Google DeepMind), Duncan Gilchrist (Delphina), and Hugo* What is an Agent Harness? from What 300+ Engineers from Netflix, Amazon, and Instacart Asked About AI Engineering.How You Can Support Vanishing GradientsVanishing Gradients is a podcast, workshop series, blog, and newsletter focused on what you can build with AI right now. Over 70 episodes with expert practitioners from Google DeepMind, Netflix, Stanford, and elsewhere. Hundreds of hours of free, hands-on workshops. All independent, all free.If you want to help keep it going:* Become a paid subscriber, from $8/month* Share this with a builder who’d find it useful* Subscribe to our YouTube channel.Thanks for reading Vanishing Gradients! This post is public so feel free to share it. Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
Episode 72: Why Agents Solve the Wrong Problem (and What Data Scientists Do Instead) 20.03.2026 1h 33min

I often see what I would consider to be b******t evals, especially in data, like write this dumb SQL. Almost every one of these dumb SQL questions that I’ve seen for benchmarks are just so either obviously easy or overwhelmingly adversarial. They just, they don’t feel valuable as a data scientist, it’s something that you probably would never ask a real data scientist to do. So I went out my way to create real ones. Let me read one to you.Bryan Bischof, Head of AI at Theory Ventures, joins Hugo to talk about what happened when 150 people spent six hours using AI agents to answer real data science questions across SQL tables, log files, and 750,000 PDFs.They Discuss:* Failure Funnels, pinpoint where agent reasoning breaks down using causal-chain binary evaluations instead of vague 1-5 scales;* Median Score: 23 out of 65, what happened when world-class engineers turned agents loose on real data work, and why general-purpose coding agents with human prodding beat fancy frameworks;* Zero-Cost Submissions Kill Trust, without a penalty for wrong answers, agents hill-climb to correct submissions through brute force instead of building confidence;* Data Science is “Zooming”, moving beyond binary decisions to iterative problem framing, refining “does our inventory suck?” into a tractable hypothesis;* MCP as Semantic Layer, model your organization’s proprietary knowledge once and distribute it to whatever LLM interface your team prefers;* The Subagent vs. Tool Debate, a distinction that adds cognitive load without hiding complexity;* Self-Orchestration Gap, agents don’t yet realize they should trigger specialized extraction frameworks like DocETL instead of reading 750K PDFs one by one;* The Future of Evals, from vibe checks to objective functions and continuous user feedback that lets systems converge on reliability.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort has started. Registration is still open. All sessions are recorded so don’t worry about having missed any. Here is a 25% discount code for readers. 👈LINKS* Bryan Bischof on Twitter/X* Bryan Bischof on LinkedIn* Theory Ventures* The Hunt for a Trustworthy Data Agent (blog post)* America’s Next Top Modeler GitHub repo* Hamel’s evals FAQ: How do I evaluate agentic workflows?* DocETL* LLM Judges and AI Agents at Scale (Hugo’s podcast with Shreya Shankar)* When Your Metrics Are Lying (Cimo Labs)* Lessons from a Year of Building with LLMs (livestream on YouTube)* Bryan Bischof: The Map is Not the Territory (YouTube)* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTube👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort has started. Registration is still open. All sessions are recorded so don’t worry about having missed any. Here is a 25% discount code for readers. 👈 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
Episode 71: Durable Agents - How to Build AI Systems That Survive a Crash with Samuel Colvin 18.02.2026 51min

Our thesis is that AI is still just engineering… those people who tell us for fun and profit, that somehow AI is so, so profound, so new, so different from anything that’s gone before that it somehow eclipses the need for good engineering practice are wrong. We need that good engineering practice still, and for the most part, most things are not new. But there are some things that have become more important with AI. One of those is durability.Samuel Colvin, Creator of Pydantic AI, joins Hugo to talk about applying battle-tested software engineering principles to build durable and reliable AI agents.They Discuss:* Production agents require engineering-grade reliability: Unlike messy coding agents, production agents need high constraint, reliability, and the ability to perform hundreds of tasks without drifting into unusual behavior;* Agents are the new “quantum” of AI software: Modern architecture uses discrete “agentlets”: small, specialized building blocks stitched together for sub-tasks within larger, durable systems;* Stop building “chocolate teapot” execution frameworks: Ditch rudimentary snapshotting; use battle-tested durable execution engines like Temporal for robust retry logic and state management;* AI observability will be a native feature: In five years, AI observability will be integrated, with token counts and prompt traces becoming standard features of all observability platforms;* Split agents into deterministic workflows and stochastic activities: Ensure true durability by isolating deterministic workflow logic from stochastic activities (IO, LLM calls) to cache results and prevent redundant model calls;* Type safety is essential for enterprise agents: Sacrificing type safety for flexible graphs leads to unmaintainable software; professional AI engineering demands strict type definitions for parallel node execution and state recovery;* Standardize on OpenTelemetry for portability: Use OpenTelemetry (OTel) to ensure agent traces and logs are portable, preventing vendor lock-in and integrating seamlessly into existing enterprise monitoring.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Here is a 25% discount code for listeners. 👈LINKS* Samuel Colvin on LinkedIn* Pydantic* Pydantic Stack Demo repo* Deep research example code* Temporal* DBOS (Postgres alternative to Temporal)* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTube👉Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort starts March 10, 2026. Here is a 25% discount code for listeners.👈https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfs Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
Episode 70: 1,400 Production AI Deployments 12.02.2026 1h 9min

There’s a company who spent almost $50,000 because an agent went into an infinite loop and they forgot about it for a month.It had no failures and I guess no one was monitoring these costs. It’s nice that people do write about that in the database as well. After it happened, they said: watch out for infinite loops. Watch out for cascading tool failures. Watch out for silent failures where the agent reports it has succeeded when it didn’t!We Discuss:* Why the most successful teams are ripping out and rebuilding their agent systems every few weeks as models improve, and why over-engineering now creates technical debt you can’t afford later;* The $50,000 infinite loop disaster and why “silent failures” are the biggest risk in production: agents confidently report success while spiraling into expensive mistakes;* How ELIOS built emergency voice agents with sub-400ms response times by aggressively throwing away context every few seconds, and why these extreme patterns are becoming standard practice;* Why DoorDash uses a three-tier agent architecture (manager, progress tracker, and specialists) with a persistent workspace that lets agents collaborate across hours or days;* Why simple text files and markdown are emerging as the best “continual learning” layer: human-readable memory that persists across sessions without fine-tuning models;* The 100-to-1 problem: for every useful output, tool-calling agents generate 100 tokens of noise, and the three tactics (reduce, offload, isolate) teams use to manage it;* Why companies are choosing Gemini Flash for document processing and Opus for long reasoning chains, and how to match models to your actual usage patterns;* The debate over vector databases versus simple grep and cat, and why giving agents standard command-line tools often beats complex APIs;* What “re-architect” as a job title reveals about the shift from 70% scaffolding / 30% model to 90% model / 10% scaffolding, and why knowing when to rip things out is the may be the most important skill today.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort starts March 10, 2026. Here is a 25% discount code for readers. 👈Show Notes Links* Alex Strick van Linschoten on LinkedIn* Alex Strick van Linschoten on Twitter/X* LLMOps Database* LLMOps Database Dataset on Hugging Face* Hugo’s MCP Server for LLMOps Database* Alex’s Blog: What 1,200+ Production Deployments Reveal About LLMOps in 2025* Previous Episode: Practical Lessons from 750 Real-World LLM Deployments* Previous Episode: Tales from 400 LLM Deployments* Context Rot Research by Chroma* Hugo’s Post: AI Agent Harness - 3 Principles for Context Engineering* Hugo’s Post: The Rise of Agentic Search* Episode with Nick Moy: The Post-Coding Era* Hugo’s Personal Podcast Prep Skill Gist* Claude Tool Search Documentation* Gastown on GitHub (Steve Yegge)* Welcome to Gastown by Steve Yegge* ZenML - Open Source MLOps & LLMOps Framework* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast livestream on YouTube* Join the final cohort of our Building AI Applications course in March, 2026 (25% off for listeners)👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort starts March 10, 2026. Here is a 25% discount code for readers. 👈 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
Episode 69: Python is Dead. Long Live Python! With the Creators of pandas & Parquet 03.02.2026 55min

> It’s the agent writing the code. And it’s the development loop of writing the code, building testing, write the code, build test and iterating. And so I do think we’ll see for many types of software, a shift away from Python towards other programming languages. I think Go is probably the best language for those like other types of software projects. And like I said, I haven’t written a line of Go code in my life.– Wes McKinney (creator of pandas Principal Architect at Posit),Wes McKinney, Marcel Kornacker, and Alison Hill join Hugo to talk about the architectural shift for multimodal AI, the rise of “agent ergonomics,” and the evolving role of developers in an AI-generated future.We Discuss:* Agent Ergonomics: Optimize for agent iteration speed, shifting from human coding to fast test environments, potentially favoring languages like Go;* Adversarial Code Review: Deploy diverse AI models to peer-review agent-generated code, catching subtle bugs humans miss;* Multimodal Data Verbs: Make operations like resizing and rotating native to your database to eliminate data-plumbing bottlenecks;* Taste as Differentiator: Value “taste”—the ability to curate and refine the best output from countless AI-generated options—over sheer execution speed;* 100x Software Volume: Embrace ephemeral, just-in-time software; prioritize aggressive generation and adversarial testing over careful planning for quality.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript of the workshop & fireside chat here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Here is a discount code for readers. 👈This was a fireside chat at the end of a livestreamed workshop we did on building multimodal AI systems with Pixeltable. Check out the full workshop below (all code here on Github):Links and Resources* Wes McKinney on LinkedIn* Marcel Kornacker on LinkedIn* Alison Hill on LinkedIn* Spicy Takes* Palmer Penguins* Pixeltable* Posit* Positron* Building Multimodal AI Systems Workshop Repository* Pixeltable Docs: LLM Tool Calling with MCP Servers* Pixeltable Docs: Working with Pydantic* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTube* Join the final cohort of our Building AI Applications course in March, 2026 (25% off for listeners)https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfsWhat people said during the workshop“I think the interface looks amazing/simple. Strong work! 🦾” — @goldentribe“This is quite amazing. Watching this I felt the same way when I first leant pandas, NumPy and scikit and how well i was able to manipulate and wrangle data. PixelTable feels seamless and looks as good as those legendary frameworks but for Multimodal Data.” — @vinod7“This is all extremely cool to see, I love the API and the approach.” — @steveb4191“Thanks so much, Hugo! That was very insightful! Great work Alison and Marcel!” — @vinod7“Just wrapped up watching a replay of the Pixeltable workshop. So cool!! Love the notebooks and working examples. The important parts were covered and worked beautifully 🕺” — @therobbrennan👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Here is a discount code for readers. 👈 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
Episode 68: A Builder’s Guide to Agentic Search & Retrieval with Doug Turnbull & John Berryman 23.01.2026 1h 28min

The best way to build a horrible search product? Don’t ever measure anything against what a user wants.Search veterans Doug Turnbull (Led Search at Reddit + Shopify; Wrote Relevant Search + AI Powered Search) and John Berryman (Early Engineer on Github Copilot; Author of Relevant Search + Prompt Engineering for LLMs), join Hugo to talk about how to build Agentic Search Applications.We Discuss:* The evolution of information retrieval as it moves from traditional keyword search toward “agentic search“ and what this means for builders.* John’s five-level maturity model (you can prototype today!) for AI adoption, moving from Trad Search to conversational AI to asynchronous research assistants that reason about result quality.* The Agentic Search Builders Playbook, including why and how you should “hand-roll” your own agentic loops to maintain control;* The importance of “revealed preferences” that LLM-judges often miss (evaluations must use real clickstream data to capture “revealed preferences” that semantic relevance alone cannot infer)* Patterns and Anti-Patterns for Agentic Search Applications* Learning and teaching Search in the Age of AgentsYou can find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Here is a discount code for readers. 👈Doug and Hugo are also doing a free lightning lesson on Feb 20 about How To Build Your First Agentic Search Application! You’ll walk away with a framework & code to build your first agentic search app. Register here to join live or get the recording after.Links and ResourcesGuests* Arcturus Labs (John’s website)* Software Doug (Doug’s website)* John Berryman on LinkedIn* Doug Turnbull on LinkedInBooks* Relevant Search by Doug Turnbull & John Berryman (Manning)* AI-Powered Search by Doug Turnbull (Manning)* Prompt Engineering for LLMs by John Berryman (O’Reilly)Blog Posts* Incremental AI Adoption for E-commerce by John Berryman* Roaming RAG – RAG without the Vector Database by John Berryman* Agents Turn Simple Keyword Search into Compelling Search Experiences by Doug Turnbull* A Simple Agentic Loop with Just Python Functions by Doug Turnbull* Agentic Code Generation to Optimize a Search Reranker by Doug Turnbull* LLM Judges Aren’t the Shortcut You Think by Doug Turnbul (Hugo’s 5 minute video below)* Malleable Software by Ink & Switch (inc. Geoffrey Lit)* Patterns and Anti-Patterns for Building with AI by Hugo Bowne-AndersonOther Resources* The Rise of Agentic Search, a recent VG Podcast with Jeff Huber* Karpathy on Cognitive Core LLMs* Cheat at Search with Agents course by Doug Turnbull (use code: vanishinggradients for $200 off)* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTube* Join the final cohort of our Building AI Applications course in Q1, 2026 (25% off for listeners)Timestamps (for YouTube livestream)00:00 How to Build Agentic Search & Retrieval Systems02:48 Defining Search and AI03:26 Evolution of Search Technologies08:46 Search in E-commerce and Other Domains12:15 Combining Search and AI: RAG and LLMs23:50 User Intent and Search Optimization29:47 Levels of AI Integration in Search32:25 Exploring the Complexity of Search in Various Domains33:49 The Evolution and Impact of Agentic Search34:07 Defining Terms: RAG and Agentic Search34:52 The Research Loop and Tool Interaction35:55 Formal Protocols and Structured Outputs38:39 Building Agentic Search Experiences: Tips and Advice41:50 The Importance of Empathy in AI and Search Development54:30 The Role of UX in Search Applications01:01:15 Future of Search: Malleable User Interfaces01:02:38 Exploring Malleable Software01:04:20 The Coordination Challenge in Software Development01:05:23 The Impact of Claude Code & Claude Cowork01:06:22 The Future of Knowledge Work with AI01:12:39 Evaluating Search Algorithms with AI01:15:15 The Role of Agents in Search Optimization01:29:55 Teaching AI and Search Techniques01:34:25 Final Thoughts and Farewell👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Here is a discount code for readers. 👈https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgpod Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
Episode 67: Saving Hundreds of Hours of Dev Time with AI Agents That Learn 14.01.2026 1h 18min

This is continual learning, right? Everyone has been talking about continual learning as the next challenge in AI. Actually, it’s solved. Just tell it to keep some notes somewhere. Sure, it’s not, it’s not machine learning, but in some ways it is because when it will load this text file again, it will influence what it does … And it works so well: it’s easy to understand. It’s easy to inspect, it’s easy to evolve and modify!Eleanor Berger and Isaac Flaath, the minds behind Elite AI Assisted Coding, join Hugo to talk about how to redefine software development through effective AI-assisted coding, leveraging “specification-first” approaches and advanced agentic workflows.We Discuss:* Markdown learning loops: Use simple agents.md files for agents to self-update rules and persist context, creating inspectable, low-cost learning;* Intent-first development: As AI commoditizes syntax, defining clear specs and what makes a result “good” becomes the core, durable developer skill;* Effortless documentation: Leverage LLMs to distill messy “brain dumps” or walks-and-talks into structured project specifications, offloading context faster;* Modular agent skills: Transition from MCP servers to simple markdown-based “skills” with YAML and scripts, allowing progressive disclosure of tool details;* Scheduled async agents: Break the chat-based productivity ceiling by using GitHub Actions or Cron jobs for agents to work on issues, shifting humans to reviewers;* Automated tech debt audits: Deploy background agents to identify duplicate code, architectural drift, or missing test coverage, leveraging AI to police AI-induced messiness;* Explicit knowledge culture: AI agents eliminate “cafeteria chat” by forcing explicit, machine-readable documentation, solving the perennial problem of lost institutional knowledge;* Tiered model strategy: Optimize token spend by using high-tier “reasoning” models (e.g., Opus) for planning and low-cost, high-speed models (e.g., Flash) for execution;* Ephemeral software specs: With near-zero generation costs, software shifts from static products to dynamic, regenerated code based on a permanent, underlying specification.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Eleanor & Isaac are teaching their next cohort of their Elite AI Assisted Coding course starting this week. They’re kindly giving readers of Vanishing Gradients 25% off. Use this link.👈👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Here is a discount code for readers. 👈Show Notes* Elite AI Assisted Coding Substack* Eleanor Berger on LinkedIn* Isaac Flaath on LinkedIn* Elite AI Assisted Coding Course (Use the code HUGO for 25% off)* How to Build an AI Agent with AI-Assisted Coding* Eleanor/Isaac’s blog post “The SpecFlow Process for AI Coding”* Eleanor’s growing list of (free) tutorials on Agent Skills* Eleanor’s YouTube playlist on agent skills* Eleanor’s blog post “Are (Agent) Skills the New Apps”* Simon Willison’s blog post on skills/general computer automation/data journalism agents* Eleanor/Isaac’s blog post about asynchronous client agents in GitHub actions* Eleanor/Isaac’s blog post on agentic coding workflows with Hang Yu, Product Lead for Qoder @ Alibaba* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTube* Join the final cohort of our Building AI Applications course in Q1, 2026 (25% off for listeners)Timestamps (for YouTube livestream)00:00 Introduction to Elite AI Assisted Coding02:24 Starting a New AI Project: Best Practices03:19 The Importance of Context in AI Projects07:19 Specification-First Planning12:01 Sharing Intent and Documentation18:27 Living Documentation and Continual Learning24:36 Choosing the Right Tools and Models29:18 Managing Costs and Token Usage40:16 Using Different Models for Different Tasks43:41 Mastering One Model for Better Results44:54 The Rise of Agent Skills in 202645:34 Understanding the Importance of Skills47:18 Practical Applications of Agent Skills01:11:43 Security Concerns with AI Agents01:15:02 Collaborative AI-Assisted Coding01:18:59 Future of AI-Assisted Coding01:22:27 Key Takeaways for Effective AI-Assisted CodingLive workshop with Eleanor, Isaac, & HugoWe also recently did a 90-minute workshop on How to Build an AI Agent with AI-Assisted Coding.We wrote a blog post on it for those who don’t have 90 minutes right now. Check it out here.I then made a 4 min video about it all for those who don’t have time to read the blog post.👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Here is a discount code for readers. 👈https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vg-ei Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
Episode 66: The Agent Paradox - Why Moderna's Most Productive AI Systems Aren't Agents 08.01.2026 42min

Surprise. We don’t have agents. I actually went in and did an audit of all the LLM applications that we’ve developed internally. And if you were to take Anthropic’s definition of workflow versus agent, we don’t have agents. I would not classify any of our applications as agents. xEric Ma, who leads Research Data Science in the Data Science and AI group at Moderna, joins Hugo on moving past the hype of autonomous agents to build reliable, high-value workflows.We discuss:* Reliable Workflows: Prioritize rigid workflows over dynamic AI agents to ensure reliability and minimize stochasticity in production environments;* Permission Mapping: The true challenge in regulated environments is security, specifically mapping permissions across source documents, vector stores, and model weights;* Trace Log Risk: LLM execution traces pose a regulatory risk, inadvertently leaking restricted data like trade secrets or personal information;* High-Value Data Work: LLMs excel at transforming archived documents and freeform forms into required formats, offloading significant “janitorial” work from scientists;* “Non-LLM” First: Solve problems with simpler tools like Python or ML models before LLMs to ensure robustness and eliminate generative AI stochasticity;* Contextual Evaluation: Tailor evaluation rigor to consequences; low-stakes tools can be “vibe-checked,” while patient safety outputs demand exhaustive error characterization;* Serverless Biotech Backbone: Serverless infrastructure like Modal and reactive notebooks such as Marimo empowers biotech data scientists for rapid deployment without heavy infrastructure overhead.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort is in Q1, 2206. Here is a 35% discount code for readers. 👈https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgch👉 Eric & Hugo have a free upcoming livestream workshop: Building Tools for Thinking with AI (register to join live or get the recording afterwards) 👈Show notes* Eric’s website* Eric Ma on LinkedIn* Eric’s blog* Eric’s data science newsletter* Building Effective AI Agents by the Anthropic team* Wow, Marimo from Eric’s blog* Wow, Modal from Eric’s blog* Upcoming Events on Luma* Watch the podcast video on YouTube* Join the final cohort of our Building AI Applications course in Q1, 2026 (35% off for listeners)Timestamps00:00 Defining Agents and Workflows02:04 Challenges in Regulated Environments04:24 Eric Ma's Role at Moderna, Leading Research Data Science in the Data Science and AI Group12:37 Document Reformatting and Automation15:42 Data Security and Permission Mapping20:05 Choosing the Right Model for Production20:41 Evaluating Model Changes with Benchmarks23:10 Vibe-Based Evaluation vs. Formal Testing27:22 Security and Fine-Tuning in LLMs28:45 Challenges and Future of Fine-Tuning34:00 Security Layers and Information Leakage37:48 Wrap-Up and Final Remarks👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort is in Q1, 2026. Here is a 35% discount code for readers. 👈https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgch Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
Episode 65: The Rise of Agentic Search 19.12.2025 51min

We’re really moving from a world where humans are authoring search queries and humans are executing those queries and humans are digesting the results to a world where AI is doing that for us.Jeff Huber, CEO and co-founder of Chroma, joins Hugo to talk about how agentic search and retrieval are changing the very nature of search and software for builders and users alike.We Discuss:* “Context engineering”, the strategic design and engineering of what context gets fed to the LLM (data, tools, memory, and more), which is now essential for building reliable, agentic AI systems;* Why simply stuffing large context windows is no longer feasible due to “context rot” as AI applications become more goal-oriented and capable of multi-step tasks* A framework for precisely curating and providing only the most relevant, high-precision information to ensure accurate and dependable AI systems;* The “agent harness”, the collection of tools and capabilities an agent can access, and how to construct these advanced systems;* Emerging best practices for builders, including hybrid search as a robust default, creating “golden datasets” for evaluation, and leveraging sub-agents to break down complex tasks* The major unsolved challenge of agent evaluation, emphasizing a shift towards iterative, data-centric approaches.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort is in Q1, 2206. Here is a 35% discount code for readers. 👈Oh! One more thing: we’ve just announced a Vanishing Gradients livestream for January 21 that you may dig:* A Builder’s Guide to Agentic Search & Retrieval with Doug Turnbull and John Berryman (register to join live or get the recording afterwards.Show notes* Jeff Huber on Twitter* Jeff Huber on LinkedIn* Try Chroma!* Context Rot: How Increasing Input Tokens Impacts LLM Performance by The Chroma Team* AI Agent Harness, 3 Principles for Context Engineering, and the Bitter Lesson Revisited* From Context Engineering to AI Agent Harnesses: The New Software Discipline* Generative Benchmarking by The Chroma Team* Effective context engineering for AI agents by The Anthropic Team* Making Sense of Millions of Conversations for AI Agents by Ivan Leo (Manus) and Hugo* How we built our multi-agent research system by The Anthropic Team* Upcoming Events on Luma* Watch the podcast video on YouTube👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort is in Q1, 2206. Here is a 35% discount code for readers. 👈https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgch Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
Episode 64: Data Science Meets Agentic AI with Michael Kennedy (Talk Python) 03.12.2025 1h 2min

We have been sold a story of complexity. Michael Kennedy (Talk Python) argues we can escape this by relentlessly focusing on the problem at hand, reducing costs by orders of magnitude in software, data, and AI.In this episode, Michael joins Hugo to dig into the practical side of running Python systems at scale. They connect these ideas to the data science workflow, exploring which software engineering practices allow AI teams to ship faster and with more confidence. They also detail how to deploy systems without unnecessary complexity and how Agentic AI is fundamentally reshaping development workflows.We talk through:- Escaping complexity hell to reduce costs and gain autonomy- The specific software practices, like the "Docker Barrier", that matter most for data scientists- How to replace complex cloud services with a simple, robust $30/month stack- The shift from writing code to "systems thinking" in the age of Agentic AI- How to manage the people-pleasing psychology of AI agents to prevent broken code- Why struggle is still essential for learning, even when AI can do the work for youLINKSTalk Python In Production, the Book! (https://talkpython.fm/books/python-in-production)Just Enough Python for Data Scientists Course (https://training.talkpython.fm/courses/just-enough-python-for-data-scientists)Agentic AI Programming for Python Course (https://training.talkpython.fm/courses/agentic-ai-programming-for-python)Talk Python To Me (https://talkpython.fm/) and a recent episode with Hugo as guest: Building Data Science with Foundation LLM Models (https://talkpython.fm/episodes/show/526/building-data-science-with-foundation-llm-models)Python Bytes podcast (https://pythonbytes.fm/)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtube.com/live/jfSRxxO3aRo?feature=share)Join the final cohort of our Building AI Applications course starting Jan 12, 2026 (35% off for listeners) (https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav): https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
Episode 63: Why Gemini 3 Will Change How You Build AI Agents with Ravin Kumar (Google DeepMind) 22.11.2025 1h

Gemini 3 is a few days old and the massive leap in performance and model reasoning has big implications for builders: as models begin to self-heal, builders are literally tearing out the functionality they built just months ago... ripping out the defensive coding and reshipping their agent harnesses entirely.Ravin Kumar (Google DeepMind) joins Hugo to breaks down exactly why the rapid evolution of models like Gemini 3 is changing how we build software. They detail the shift from simple tool calling to building reliable "Agent Harnesses", explore the architectural tradeoffs between deterministic workflows and high-agency systems, the nuance of preventing context rot in massive windows, and why proper evaluation infrastructure is the only way to manage the chaos of autonomous loops.They talk through:- The implications of models that can "self-heal" and fix their own code- The two cultures of agents: LLM workflows with a few tools versus when you should unleash high-agency, autonomous systems.- Inside NotebookLM: moving from prototypes to viral production features like Audio Overviews- Why Needle in a Haystack benchmarks often fail to predict real-world performance- How to build agent harnesses that turn model capabilities into product velocity- The shift from measuring latency to managing time-to-compute for reasoning tasksLINKSFrom Context Engineering to AI Agent Harnesses: The New Software Discipline, a podcast Hugo did with Lance Martin, LangChain (https://high-signal.delphina.ai/episode/context-engineering-to-ai-agent-harnesses-the-new-software-discipline)Context Rot: How Increasing Input Tokens Impacts LLM Performance (https://research.trychroma.com/context-rot)Effective context engineering for AI agents by Anthropic (https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtu.be/CloimQsQuJM)Join the final cohort of our Building AI Applications course starting Jan 12, 2026 (https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav): https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
Episode 62: Practical AI at Work: How Execs and Developers Can Actually Use LLMs 31.10.2025 59min

Many leaders are trapped between chasing ambitious, ill-defined AI projects and the paralysis of not knowing where to start. Dr. Randall Olson argues that the real opportunity isn't in moonshots, but in the "trillions of dollars of business value" available right now. As co-founder of Wyrd Studios, he bridges the gap between data science, AI engineering, and executive strategy to deliver a practical framework for execution.In this episode, Randy and Hugo lay out how to find and solve what might be considered "boring but valuable" problems, like an EdTech company automating 20% of its support tickets with a simple retrieval bot instead of a complex AI tutor. They discuss how to move incrementally along the "agentic spectrum" and why treating AI evaluation with the same rigor as software engineering is non-negotiable for building a disciplined, high-impact AI strategy.They talk through:How a non-technical leader can prototype a complex insurance claim classifier using just photos and a ChatGPT subscription.The agentic spectrum: Why you should start by automating meeting summaries before attempting to build fully autonomous agents.The practical first step for any executive: Building a personal knowledge base with meeting transcripts and strategy docs to get tailored AI advice.Why treating AI evaluation with the same rigor as unit testing is essential for shipping reliable products.The organizational shift required to unlock long-term AI gains, even if it means a short-term productivity dip.LINKSRandy on LinkedIn (https://www.zenml.io/llmops-database)Wyrd Studios (https://thewyrdstudios.com/)Stop Building AI Agents (https://www.decodingai.com/p/stop-building-ai-agents)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtu.be/-YQjKH3wRvc)🎓 Learn more:In Hugo's course: Building AI Applications for Data Scientists and Software Engineers (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20) — https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20 Next cohort starts November 3: come build with us! Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

Popular în

Acest podcast apare și în topurile de podcasturi din aceste țări.

Vanishing Gradients

Episoade

Podcasturi similare

Popular în