The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Sam Charrington
Land USA
Sjanger News, Technology, Tech News
Språk EN
Episoder 786
Siste 21.05.2026

The TWIML AI Podcast, hosted by Sam Charrington, features interviews with top researchers and practitioners in machine learning and artificial intelligence. It covers a wide range of topics including deep learning, natural language processing, neural networks, and data science. The podcast aims to make complex AI concepts accessible to a broad audience of engineers, data scientists, and business leaders.

Episoder

  • Relational Foundation Models for Enterprise Data with Jure Leskovec - #768 21.05.2026 1t 6min
    In this episode, Jure Leskovec, co-founder and chief scientist at Kumo and professor of computer science at Stanford, joins us to explore two fronts of his work: AI for science and relational deep learning. We begin with AI Virtual Cell, a multiscale effort to learn data-driven representations from proteins to cells to patients using single-cell RNA-seq data, protein language models like ESM, and structure models like AlphaFold—without hand-encoding biology. Jure then dives into relational deep learning, reframing enterprise databases as graphs and training neural networks directly on raw multi-table data. He explains Kumo’s Relational Foundation Model (RFM2), which performs in-context learning over subgraphs to make accurate predictions on new databases and tasks with no training, and how this approach benchmarks against RelBench and other multi-table datasets. We also discuss real-world deployments at companies like Reddit, DoorDash, and Coinbase, explainability via attention over tables and columns, integration with agentic systems, deployment options, and practical limitations. The complete show notes for this episode can be found at https://twimlai.com/go/768.
  • How to Find the Agent Failures Your Evals Miss with Scott Clark - #767 07.05.2026 53min
    In this episode, Scott Clark, co-founder and CEO of Distributional, joins us to explore how teams can reliably operate and improve complex LLM systems and agents in production. Scott introduces a Maslow’s hierarchy of observability: telemetry for logging, monitoring for known signals, and post-production or online analytics to surface unknown unknowns. We dig into examples of real-world failures Scott’s team has seen in production systems, such as “lazy” tool-use hallucinations that standard evals miss, and how mapping traces into vector fingerprints enables clustering and topic discovery to uncover emergent behaviors. Scott explains how analytics can feed the data flywheel by generating evals, guardrails, and training data, and why online, adaptive approaches are essential for non-stationary models. We also touch on practical how-to’s such as instrumentation with OpenTelemetry, the GenAI semantic conventions, and the role of dedicated analytics tools. The complete show notes for this episode can be found at https://twimlai.com/go/767.
  • How to Engineer AI Inference Systems with Philip Kiely - #766 30.04.2026 54min
    In this episode, Philip Kiely, head of AI education at Baseten, joins us to unpack the fast-evolving discipline of inference engineering. We explore why inference has become the stickiest and most critical workload in AI, how it blends GPU programming, applied research, and large-scale distributed systems, and where the line sits between inference and model serving. Philip shares how research-to-production can move in hours, not months, and why understanding “the knobs” of inference—batching, quantization, speculation, and KV cache reuse—lets teams design better products and SLAs. We trace the inference maturity journey from closed APIs to dedicated deployments and in-house platforms, discuss GPU lifecycles, and survey today’s runtime landscape, including vLLM, SGLang, and TensorRT LLM. Finally, we look ahead to agents and multimodality, making the case for specialized, workload-specific runtimes when performance and efficiency matter most. The complete show notes for this episode can be found at https://twimlai.com/go/766.
  • How Capital One Delivers Multi-Agent Systems with Rashmi Shetty - #765 16.04.2026 54min
    In this episode, Rashmi Shetty, senior director of enterprise generative AI platform at Capital One, joins us to explore how the company is designing, deploying, and scaling multi-agent systems in a highly regulated environment. Rashmi walks us through Chat Concierge, a multi-agent chat experience for auto dealerships that handles intent disambiguation, tool invocation, and human handoffs to deliver safer, more personalized customer journeys. We discuss Capital One’s platform-centric approach to AI agents and how it separates design from runtime governance, embedding policies, guardrails, and cyber controls across agent threat boundaries. Rashmi shares how the team approaches the developer experience for agent builders, observability, and evals for stochastic, multi-agent workflows; and strategies for model specialization, including fine-tuning and distillation. We also cover standards and abstraction, closed-loop learning from production telemetry, and key lessons for enterprises building agentic systems. The complete show notes for this episode can be found at https://twimlai.com/go/765.
  • The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764 26.03.2026 1t 3min
    Today, we're joined by Stefano Ermon, associate professor at Stanford University and CEO of Inception Labs to discuss diffusion language models. We dig into how diffusion approaches—traditionally used for images—are being adapted for text and code generation, the technical challenges of applying continuous methods to discrete token spaces, and how diffusion models compare to traditional autoregressive LLMs. Stefano introduces Mercury 2, a commercial-scale diffusion LLM that can generate multiple tokens simultaneously and achieve inference speeds 5-10x faster than small frontier models, paving the way for latency-sensitive applications like voice interactions and fast agentic loops. We also cover the open research challenges in diffusion LLM training, serving infrastructure requirements, and post-training for diffusion-based systems. Finally, Stefano shares his perspective on whether diffusion models can rival or surpass autoregressive LLMs at scale, the advantages for highly controllable generation, and what the future of multimodal diffusion models might look like. The complete show notes for this episode can be found at https://twimlai.com/go/764.
  • Agent Swarms and Knowledge Graphs for Autonomous Software Development with Siddhant Pardeshi - #763 10.03.2026 1t 16min
    In this episode, Sid Pardeshi, co-founder and CTO of Blitzy, joins us to discuss building autonomous development systems able to deliver production-ready software at enterprise scale. Sid contrasts AI-assisted coding with end-to-end autonomy, arguing that “code is a commodity” and acceptance is the real metric—security, standards, tests, and maintainability included. We explore Blitzy’s hybrid graph-plus-vector approach, which grounds agents and combines semantic signals with keyword search to navigate large repositories efficiently. Sid breaks down context and agent engineering, how effective context windows have plateaued, and why dynamic agent personas, tool selection, and model-specific prompting matter at scale. He details their orchestration of large swarms of AI agents to collaboratively analyze codebases, plan tasks, and execute complex tasks in parallel. We also dig into why Agents.md and flat memories break down, storing feedback in the knowledge graph, and building real-world evals beyond leaderboards to choose the right model for each task. The complete show notes for this episode can be found at https://twimlai.com/go/763.
  • AI Trends 2026: OpenClaw Agents, Reasoning LLMs, and More with Sebastian Raschka - #762 26.02.2026 1t 18min
    In this episode, Sebastian Raschka, independent LLM researcher and author, joins us to break down how the LLM landscape has changed over the past year and what is likely to matter most in 2026. We discuss the shift from raw model scaling to reasoning-focused post-training, inference-time techniques, and better tool integration. Sebastian explains why methods like self-consistency, self-refinement, and verifiable-reward reinforcement learning have become central to progress in domains like math and coding, and where those approaches still fall short. We also explore agentic workflows in practice, including where multi-agent systems add real value and where reliability constraints still dominate system design. The conversation covers architecture trends such as mixture-of-experts, attention efficiency strategies, and the practical impact of long-context models, alongside persistent challenges like continual learning. We close with Sebastian’s perspective on maintaining strong coding fundamentals in the age of AI assistants and a preview of his new book, Build A Reasoning Model (From Scratch). The complete show notes for this episode can be found at https://twimlai.com/go/762.
  • The Evolution of Reasoning in Small Language Models with Yejin Choi - #761 29.01.2026 1t 6min
    Today, we're joined by Yejin Choi, professor and senior fellow at Stanford University in the Computer Science Department and the Institute for Human-Centered AI (HAI). In this conversation, we explore Yejin’s recent work on making small language models reason more effectively. We discuss how high-quality, diverse data plays a central role in closing the intelligence gap between small and large models, and how combining synthetic data generation, imitation learning, and reinforcement learning can unlock stronger reasoning capabilities in smaller models. Yejin explains the risks of homogeneity in model outputs and mode collapse highlighted in her “Artificial Hivemind” paper, and its impacts on human creativity and knowledge. We also discuss her team's novel approaches, including reinforcement learning as a pre-training objective, where models are incentivized to “think” before predicting the next token, and "Prismatic Synthesis," a gradient-based method for generating diverse synthetic math data while filtering overrepresented examples. Additionally, we cover the societal implications of AI and the concept of pluralistic alignment—ensuring AI reflects the diverse norms and values of humanity. Finally, Yejin shares her mission to democratize AI beyond large organizations and offers her predictions for the coming year. The complete show notes for this episode can be found at https://twimlai.com/go/761.
  • Intelligent Robots in 2026: Are We There Yet? with Nikita Rudin - #760 08.01.2026 1t 6min
    Today, we're joined by Nikita Rudin, co-founder and CEO of Flexion Robotics to discuss the gap between current robotic capabilities and what’s required to deploy fully autonomous robots in the real world. Nikita explains how reinforcement learning and simulation have driven rapid progress in robot locomotion—and why locomotion is still far from “solved.” We dig into the sim2real gap, and how adding visual inputs introduces noise and significantly complicates sim-to-real transfer. We also explore the debate between end-to-end models and modular approaches, and why separating locomotion, planning, and semantics remains a pragmatic approach today. Nikita also introduces the concept of "real-to-sim", which uses real-world data to refine simulation parameters for higher fidelity training, discusses how reinforcement learning, imitation learning, and teleoperation data are combined to train robust policies for both quadruped and humanoid robots, and introduces Flexion's hierarchical approach that utilizes pre-trained Vision-Language Models (VLMs) for high-level task orchestration with Vision-Language-Action (VLA) models and low-level whole-body trackers. Finally, Nikita shares the behind-the-scenes in humanoid robot demos, his take on reinforcement learning in simulation versus the real world, the nuances of reward tuning, and offers practical advice for researchers and practitioners looking to get started in robotics today. The complete show notes for this episode can be found at https://twimlai.com/go/760.
  • Rethinking Pre-Training for Agentic AI with Aakanksha Chowdhery - #759 17.12.2025 52min
    Today, we're joined by Aakanksha Chowdhery, member of technical staff at Reflection, to explore the fundamental shifts required to build true agentic AI. While the industry has largely focused on post-training techniques to improve reasoning, Aakanksha draws on her experience leading pre-training efforts for Google’s PaLM and early Gemini models to argue that pre-training itself must be rethought to move beyond static benchmarks. We explore the limitations of next-token prediction for multi-step workflows and examine how attention mechanisms, loss objectives, and training data must evolve to support long-form reasoning and planning. Aakanksha shares insights on the difference between context retrieval and actual reasoning, the importance of "trajectory" training data, and why scaling remains essential for discovering emergent agentic capabilities like error recovery and dynamic tool learning. The complete show notes for this episode can be found at https://twimlai.com/go/759.
  • Why Vision Language Models Ignore What They See with Munawar Hayat - #758 09.12.2025 57min
    In this episode, we’re joined by Munawar Hayat, researcher at Qualcomm AI Research, to discuss a series of papers presented at NeurIPS 2025 focusing on multimodal and generative AI. We dive into the persistent challenge of object hallucination in Vision-Language Models (VLMs), why models often discard visual information in favor of pre-trained language priors, and how his team used attention-guided alignment to enforce better visual grounding. We also explore a novel approach to generalized contrastive learning designed to solve complex, composed retrieval tasks—such as searching via combined text and image queries—without increasing inference costs. Finally, we cover the difficulties generative models face when rendering multiple human subjects, and the new "MultiHuman Testbench" his team created to measure and mitigate issues like identity leakage and attribute blending. Throughout the discussion, we examine how these innovations align with the need for efficient, on-device AI deployment. The complete show notes for this episode can be found at https://twimlai.com/go/758.
  • Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757 02.12.2025 48min
    In this episode, Zain Asgar, co-founder and CEO of Gimlet Labs, joins us to discuss the heterogeneous AI inference across diverse hardware. Zain argues that the current industry standard of running all AI workloads on high-end GPUs is unsustainable for agents, which consume significantly more tokens than traditional LLM applications. We explore Gimlet’s approach to heterogeneous inference, which involves disaggregating workloads across a mix of hardware—from H100s to older GPUs and CPUs—to optimize unit economics without sacrificing performance. We dive into their "three-layer cake" architecture: workload disaggregation, a compilation layer that maps models to specific hardware targets, and a novel system that uses LLMs to autonomously rewrite and optimize compute kernels. Finally, we discuss the complexities of networking in heterogeneous environments, the trade-offs between numerical precision and application accuracy, and the future of hardware-aware scheduling. The complete show notes for this episode can be found at https://twimlai.com/go/757.
  • Proactive Agents for the Web with Devi Parikh - #756 19.11.2025 56min
    Today, we're joined by Devi Parikh, co-founder and co-CEO of Yutori, to discuss browser use models and a future where we interact with the web through proactive, autonomous agents. We explore the technical challenges of creating reliable web agents, the advantages of visually-grounded models that operate on screenshots rather than the browser’s more brittle document object model, or DOM, and why this counterintuitive choice has proven far more robust and generalizable for handling complex web interfaces. Devi also shares insights into Yutori’s training pipeline, which has evolved from supervised fine-tuning to include rejection sampling and reinforcement learning. Finally, we discuss how Yutori’s “Scouts” agents orchestrate multiple tools and sub-agents to handle complex queries, the importance of background, "ambient" operation for these systems, and what the path looks like from simple monitoring to full task automation on the web. The complete show notes for this episode can be found at https://twimlai.com/go/756.
  • AI Orchestration for Smart Cities and the Enterprise with Robin Braun and Luke Norris - #755 12.11.2025 54min
    Today, we're joined by Robin Braun, VP of AI business development for hybrid cloud at HPE, and Luke Norris, co-founder and CEO of Kamiwaza, to discuss how AI systems can be used to automate complex workflows and unlock value from legacy enterprise data. Robin and Luke detail high-impact use cases from HPE and Kamiwaza’s collaboration on an “Agentic Smart City” project for Vail, Colorado, including remediation and automation of website accessibility for 508 compliance, digitization and understanding of deed restrictions, and combining contextual information with camera feeds for fire detection and risk assessment. Additionally, we discuss the role of private cloud infrastructure in overcoming challenges like cost, data privacy, and compliance. Robin and Luke also share their lessons learned, including the importance of fresh data, and the value of a "mud puddle by mud puddle" approach in achieving practical AI wins. The complete show notes for this episode can be found at https://twimlai.com/go/755.
  • Building an AI Mathematician with Carina Hong - #754 04.11.2025 55min
    In this episode, Carina Hong, founder and CEO of Axiom, joins us to discuss her work building an "AI Mathematician." Carina explains why this is a pivotal moment for AI in mathematics, citing a convergence of three key areas: the advanced reasoning capabilities of modern LLMs, the rise of formal proof languages like Lean, and breakthroughs in code generation. We explore the core technical challenges, including the massive data gap between general-purpose code and formal math code, and the difficult problem of "autoformalization," or translating natural language proofs into a machine-verifiable format. Carina also shares Axiom's vision for a self-improving system that uses a self-play loop of conjecturing and proving to discover new mathematical knowledge. Finally, we discuss the broader applications of this technology in areas like formal verification for high-stakes software and hardware. The complete show notes for this episode can be found at https://twimlai.com/go/754.
  • High-Efficiency Diffusion Models for On-Device Image Generation and Editing with Hung Bui - #753 28.10.2025 52min
    In this episode, Hung Bui, Technology Vice President at Qualcomm, joins us to explore the latest high-efficiency techniques for running generative AI, particularly diffusion models, on-device. We dive deep into the technical challenges of deploying these models, which are powerful but computationally expensive due to their iterative sampling process. Hung details his team's work on SwiftBrush and SwiftEdit, which enable high-quality text-to-image generation and editing in a single inference step. He explains their novel distillation framework, where a multi-step teacher model guides the training of an efficient, single-step student model. We explore the architecture and training, including the use of a secondary 'coach' network that aligns the student's denoising function with the teacher's, allowing the model to bypass the iterative process entirely. Finally, we discuss how these efficiency breakthroughs pave the way for personalized on-device agents and the challenges of running reasoning models with techniques like inference-time scaling under a fixed compute budget. The complete show notes for this episode can be found at  https://twimlai.com/go/753.
  • Vibe Coding's Uncanny Valley with Alexandre Pesant - #752 22.10.2025 1t 12min
    Today, we're joined by Alexandre Pesant, AI lead at Lovable, who joins us to discuss the evolution and practice of vibe coding. Alex shares his take on how AI is enabling a shift in software development from typing characters to expressing intent, creating a new layer of abstraction similar to how high-level code compiles to machine code. We explore the current capabilities and limitations of coding agents, the importance of context engineering, and the practices that separate successful vibe coders from frustrated ones. Alex also shares Lovable’s technical journey, from an early, complex agent architecture that failed, to a simpler workflow-based system, and back again to an agentic approach as foundation models improved. He also details the company's massive scaling challenges—like accidentally taking down GitHub—and makes the case for why robust evaluations and more expressive user interfaces are the most critical components for AI-native development tools to succeed in the near future. The complete show notes for this episode can be found at https://twimlai.com/go/752.
  • Dataflow Computing for AI Inference with Kunle Olukotun - #751 14.10.2025 57min
    In this episode, we're joined by Kunle Olukotun, professor of electrical engineering and computer science at Stanford University and co-founder and chief technologist at Sambanova Systems, to discuss reconfigurable dataflow architectures for AI inference. Kunle explains the core idea of building computers that are dynamically configured to match the dataflow graph of an AI model, moving beyond the traditional instruction-fetch paradigm of CPUs and GPUs. We explore how this architecture is well-suited for LLM inference, reducing memory bandwidth bottlenecks and improving performance. Kunle reviews how this system also enables efficient multi-model serving and agentic workflows through its large, tiered memory and fast model-switching capabilities. Finally, we discuss his research into future dynamic reconfigurable architectures, and the use of AI agents to build compilers for new hardware. The complete show notes for this episode can be found at https://twimlai.com/go/751.
  • Recurrence and Attention for Long-Context Transformers with Jacob Buckman - #750 07.10.2025 57min
    Today, we're joined by Jacob Buckman, co-founder and CEO of Manifest AI to discuss achieving long context in transformers. We discuss the bottlenecks of scaling context length and recent techniques to overcome them, including windowed attention, grouped query attention, and latent space attention. We explore the idea of weight-state balance and the weight-state FLOP ratio as a way of reasoning about the optimality of compute architectures, and we dig into the Power Retention architecture, which blends the parallelization of attention with the linear scaling of recurrence and promises speedups of >10x during training and >100x during inference. We review Manifest AI’s recent open source projects as well: Vidrial—a custom CUDA framework for building highly optimized GPU kernels in Python, and PowerCoder—a 3B-parameter coding model fine-tuned from StarCoder to use power retention. Our chat also covers the use of metrics like in-context learning curves and negative log likelihood to measure context utility, the implications of scaling laws, and the future of long context lengths in AI applications. The complete show notes for this episode can be found at https://twimlai.com/go/750.
  • The Decentralized Future of Private AI with Illia Polosukhin - #749 30.09.2025 1t 5min
    In this episode, Illia Polosukhin, a co-author of the seminal "Attention Is All You Need" paper and co-founder of Near AI, joins us to discuss his vision for building private, decentralized, and user-owned AI. Illia shares his unique journey from developing the Transformer architecture at Google to building the NEAR Protocol blockchain to solve global payment challenges, and now applying those decentralized principles back to AI. We explore how Near AI is creating a decentralized cloud that leverages confidential computing, secure enclaves, and the blockchain to protect both user data and proprietary model weights. Illia also shares his three-part approach to fostering trust: open model training to eliminate hidden biases and "sleeper agents," verifiability of inference to ensure the model runs as intended, and formal verification at the invocation layer to enforce composable guarantees on AI agent actions. Finally, Illia shares his perspective on the future of open research, the role of tokenized incentive models, and the need for formal verification in building compliance and user trust. The complete show notes for this episode can be found at https://twimlai.com/go/749.

Populær i

Denne podkasten finnes også i podkast-listene til disse landene.