Max Agency

LangChain

Land USA

Genrer Teknologi

Sprog EN

Episoder 5

Seneste 16.07.2026

Max Agency is a podcast about how the best AI agents are actually being built. Hosted by Harrison Chase, CEO of LangChain, each episode goes deep with the builders designing, deploying, and learning from real agent systems in the wild. From architecture decisions to evals, tooling, and failure modes, it is for people who want to understand what it really takes to build useful agents.

Episoder

The best AI agents cost less than you think | Eno Reyes, Factory 16.07.2026 1t 17min

Eno Reyes is the co-founder and CTO at Factory, a $1.5 billion company turning signal into deployed code inside some of the world's biggest engineering orgs. Before founding Factory in 2023, he was an engineer at Microsoft and then Hugging Face. In this conversation, Eno unpacks why the harness matters more than the model running underneath it, how to build your own 24/7 software factory, and why he's "bullish on humans in the loop for a very long time.”–We also discuss:Why product management isn't going awayThe engineer who lint-checks Factory's own agentsWhy coding agents might become the best general agentsThe platonic representation hypothesisWhy Factory might hide when "memory" is happeningBuilding Factory's universal meta harnessTokenomics, and how routing cuts the bill–Timestamps:(00:00) Introduction(01:19) What a 24/7 autonomous software factory actually means(03:16) Why product management isn't going away(06:56) Alvin, the engineer who lint-checks Factory's own agents(10:16) "It makes me very bullish on humans in the loop for a very long time"(11:37) The Disney Epcot analogy for rolling out AI(16:47) Why coding agents might become the best general agents(20:54) The case against a model-independent harness, and Eno's counter(25:44) The platonic representation hypothesis explained(32:42) Why model quirks are like being left or right-handed(39:28) Why you could technically decompile Factory's entire harness(44:20) Why memory might be AI's most overused word(46:47) Inside AutoWiki and its Lore feature(55:32) Agent readiness: the deterministic feedback agents need(57:12) Missions: Factory's universal meta harness(1:01:12) "It's kind of turtles all the way down": validating the validators(1:04:37) Tokenomics: what missions cost, and how routing cuts the bill(1:11:08) Why Eno is bullish on open models(1:14:30) BenchBench, and why code review benchmarks might be broken–References:AiderAlvin SngAmpAndrej KarpathyAnthropicAnthropic: Emotion concepts and their function in a large language modelCursorDeep AgentsDeepWikiEpcotFactoryGLMHarborHugging FaceHuggingGPTKimiLangGraphLangSmithMiniMaxOpen Knowledge Format (OKF)OpenAIOpenRouter Model FusionRampSakana FuguSWE-benchTerminal-Bench–Where to find Eno:LinkedInTwitter/X–Where to find Harrison:LinkedInTwitter/X–Where to find LangChain:WebsiteDocs–Send feedback or questions to maxagency@langchain.dev
The best AI agents are simpler than you think | Ben Tannyhill, LangChain 02.07.2026 50min

Ben Tannyhill is a product manager at LangChain, where he's building LangSmith Engine—an agent that finds and fixes your agent's failures. Engine continuously analyzes your production traces, clusters them into actionable issues, and opens pull requests to fix them. Engine's architecture is a lot like an org chart: a main model delegating to a team of cheaper, faster sub-agents. It launched in public beta at Interrupt 2026, and in this conversation, Ben unpacks why it uses a sandbox as a tool, how the team turned it into a self-improving agent that learns from its own traces, and the hard problem of testing a fix before it ships.–We also discuss:Why Engine is "the agent for agent engineers"Making LangSmith agent-native with condensed trace viewsWhy the team keeps handing more control to the agentInside Engine's four sub-agents: the screener, verifier, and moreGiving Engine memory with an agent overview documentHow to keep an always-on agent from blowing the inference budgetWhere Insights, Polly, and Engine are converging–Timestamps:(00:00) Introduction(01:25) LangSmith 101(02:22) Why Engine is "the agent for agent engineers"(03:49) Under the hood: Engine is a deep agent(06:08) Clustering millions of traces with condensed views(10:10) Why the team keeps handing more control to the agent(13:21) Why Engine uses a sandbox as a tool(14:11) Engine's four sub-agents and the org-chart analogy(16:51) Evals for Engine: IssueBench, Harbor, and synthetic environments(23:05) How Engine evolved: from noisy PRs to an issue inbox(25:56) Inside Engine's memory: the agent overview document(29:25) How to keep an always-on agent from blowing the inference budget(30:52) What models Engine uses(31:30) How Engine was rolled out: from Forge to public beta at Interrupt(34:18) Inside the two teams building Engine(35:53) Where Insights, Polly, and Engine are converging(40:06) The missing piece: testing a fix before it ships(42:22) Running a branched agent, and the write-access eval problem(46:35) Using Engine as long-term memory(47:39) Pointing Engine at coding-agent traces(48:49) Running Engine on Engine: the meta self-improvement loop–References:AnthropicChat LangChainClaude CodeClaude HaikuClaude OpusCodexContext HubCredit GenieDeep AgentsGeminiGPT-5.5HarborHexInsightsInterruptLangGraphLangSmithLangSmith Chat (formerly Polly)LangSmith EngineLangSmith ObservabilityMintlifyOpenAIPalash ShahTerminal-BenchUnify–Where to find Ben:LinkedInTwitter/X–Where to find Harrison:LinkedInTwitter/X–Where to find LangChain:WebsiteDocs–Send feedback or questions to maxagency@langchain.dev
The best AI agents are simpler than you think | Zack Reneau-Wedeen, Sierra 18.06.2026 1t 27min

Zack Reneau-Wedeen is the Head of Product at Sierra, the conversational AI platform behind customer-facing agents for most of the Fortune 20. Before Sierra, he spent seven years at Google as the founding PM for Google Lens and Google Podcasts, then led product at Robinhood and CoinTracker. Sierra is mostly known for customer support, but Zack reveals how and why the company is building agents that span the entire customer lifecycle, from browsing and booking to sales and loyalty. In this conversation, he argues agentic commerce will be bigger than e-commerce, explains why he's a "monolith loyalist", and unpacks why, when a model looks dumb, the problem is usually you.–We also discuss:How Sierra's no-code layer compiles down to agent code, and back againWhy most multi-agent systems just ship your org chartInside Sierra's modular voice architecture: thinking, listening, and talking in parallelWhy Sierra built a PCI-certified stack for voice paymentsHow outcome-based pricing aligns incentivesWhy there's no breakout memory company–Timestamps:(00:00) Introduction(03:39) Analyze, build, release: how you build on Sierra(07:54) Inside Ghostwriter(11:04) Meeting models on their turf “80% of the time(17:47) The one constraint Claude Code doesn't have(19:35) Agent-to-agent: when an API call still beats MCP(21:02) Why agentic commerce will be bigger than e-commerce(27:31) Running models in parallel and ensembling transcription(32:22) Inside the Agent Data Platform(40:00) Context engineering: everything it needs, nothing more(41:38) "Whenever you think the model's too dumb, the model's actually too smart"(46:13) Why multi-agent systems are a trap(48:44) Voice 101: latency, naturalism, and 60 languages(56:11) When voice-to-voice passes 50%: the over/under(57:03) Making memory a first-class primitive(1:02:47) Why there's no breakout memory company(1:08:02) Why the solution to all AI problems "is more AI"(1:09:20) Why Sierra open-sources the tau-bench universe(1:14:42) How outcome-based pricing aligns incentives(1:20:26) Who thrives as a forward-deployed agent builder(1:22:16) The Formula One analogy: why product is the bottleneck(1:25:47) How Sierra interviews for agency–References:Agent2Agent (A2A) ProtocolAnthropicChatGPTClaudeClaude CodeClaude MythosClaude Opus 4.5CodexDeep AgentsGeminiHawaiian AirlinesLangGraphModel Context Protocol (MCP)Not Another Workflow BuilderRedfinSentryShopifySileroSiriusXMStripeTau-benchThinking Machines Lab–Where to find Zack:LinkedInTwitter/XSierra–Where to find Harrison:LinkedInTwitter/X–Where to find LangChain:WebsiteDocs–Send feedback or questions to maxagency@langchain.dev
The tool design tricks behind Benchling's AI agents | Nick Larus-Stone 04.06.2026 50min

Nick Larus-Stone is the Head of AI at Benchling, the R&D data platform that life science companies use to store and manage their experiments, samples, instruments, and analysis. Benchling has been around for since 2012. In October 2025, it launched Benchling AI, an intelligence layer with a chat interface, backed by an agent, that helps scientists find data, design experiments, and write reports. Nick came to Benchling through its acquisition of Sphinx Bio, the analysis startup he founded. In this conversation, Nick walks through what it takes to build agents for scientific work, and where the playbook from coding agents holds up and where it breaks down.–We also discuss:Why Benchling invests so heavily in getting clean data upfrontHow they cross-check answers between models to get more out of each oneWhy and how Benchling leans on production tracesWhere AI actually helps science today, and where it still gets stuckWhy understanding LLMs is closer to biology than software engineering–Timestamps:(00:00) Intro(01:22) What Benchling AI is, and the 14-year data platform underneath it(04:36) Why a decade of structured data is a core advantage(05:57) The architecture under the hood(08:28) Similarities and differences compared to a coding harness(11:14) Benchling’s multi-agent architectures(14:36) Dealing with verifiable vs non-verifiable tasks(16:19) Doing evals when clean benchmarks aren’t possible(18:13) Context engineering: SQL vs. file-based harnesses(22:11) Memory: agents that create and update their own skills(25:30) What user education for scientists looks like(30:33) Why understanding LLMs is closer to biology than software(33:28) When will agents discover a novel cure for disease?(44:58) The future of harnesses in science(48:13) Why fine-tuning on biology hasn't beaten frontier models–References:Agent Skills (Claude Docs)Benchling’s Deep Research AgentClaude (Anthropic)Design of experiments (DOE)FDA Investigational New Drug (IND) applicationGemini (Google)Google AI co-scientistLangSmithModel Context Protocol (MCP)The Ralph (Wiggum) Loop (Geoffrey Huntley)Sphinx Bio–Where to find Nick:BenchlingLinkedInTwitter/X–Where to find Harrison:LinkedInTwitter/X–Where to find LangChain:WebsiteDocs–Send feedback or questions to maxagency@langchain.dev
How Cogent builds AI agents that have to be right every single time | Geng Sng (Co-founder & CTO - Cogent) 22.05.2026 1t 14min

Geng Sng is co-founder and CTO of Cogent, which builds autonomous agents that remediate vulnerabilities for enterprise security teams. Today, Cogent's agents process billions of security events per day, maintaining a live context graph of every asset and vulnerability across customer environments. In this conversation, Geng walks through Cogent's hot vs cold context split, the sub-agents that handle side quests, and the two graphs they run in parallel.–We also discuss:Why defensive security is harder for AI than offensiveUnder the hood of Cogent's three agentsInside Cogent's “read only” by-default sandboxesWhy graph databases don't scale for security dataCogent Research and the move into formal verificationWhy interactive agents need a deeper planning phase to one-shot–Referenced:Abnormal AIAmazon S3AnthropicBashChatGPTClaude CodeClaude MythosCodeMenderCodexCogentCursorGoogle DeepMindGPT-5.5-CyberJupyterLettaMozillaOpenAIOpus 4.6Opus 4.7Vercel–Where to find Geng:LinkedIn–Where to find Harrison:LinkedInTwitter/X–Where to find LangChain:WebsiteDocs–Send feedback or questions to maxagency@langchain.dev–Timestamps:(00:00) Why mean time to exploit collapsed from years to minutes(02:08) Inside Cogent's Agent Lake architecture(05:11) Why Cogent rejected graph databases(10:48) The trust ladder before agents touch production(15:13) The three types of agents inside Cogent(17:07) How Cogent sandboxes its agents(19:16) Short-circuiting interactive agents with a deeper planning phase(24:31) What to do when users believe agents too much(31:21) Why sub-agents let agents go on side quests(34:59) Two-tiered evals and the metric that catches bad prompts(40:00) Cogent’s unique approach to context(48:39) Cogent Research and the move into formal verification(51:33) The single trait Cogent hires for(54:00) Open-sourcing models within six months(57:07) Why defensive security won’t be commoditized anytime soon(1:00:51) The founding insight behind Cogent
How Ramp built an AI agent that can think outside of tokens | Alex Shevchenko 07.05.2026 44min

Alexander Shevchenko is the head of applied research at Ramp, where he leads Ramp Labs – the team behind Ramp Sheets and a steady stream of public AI engineering experiments. Ramp Sheets started as an internal process mining tool that turned Loom videos of accountants into Markov diagrams, before evolving into the agentic spreadsheet editor that shipped in November. In this conversation, Alex walks through the architecture under the hood, why Ramp biases the agent toward Excel formulas over Python code gen, and two recent Labs experiments: Latent Briefing and a user-steerable revival of Golden Gate Claude.We also discuss:Under the hood of Ramp SheetsInspect, Ramp's internal coding agent, and the self-improving monitor loop it powersWhy finance professionals rejected code gen as too "black box"Why Anthropic models tend to excel at agentic spreadsheet manipulationThe case for putting the agent outside the sandbox, not inside itThe Loom-to-Markov-diagram process mining pipelineRLMs and how subagents can share memory in latent spaceLatent Briefing and KV-cache communication between subagentsReviving Golden Gate Claude with steering vectors on GemmaReferenced:Alex LevinsonAnthropicBen GeistClaudeEfficient Memory Sharing for Multi-Agent Systems via KV Cache Compaction (Ben Geist)GemmaGolden Gate ClaudeGraphvizInspectLatent BriefingLoomModalOpenAIOpusQwenRampRamp LabsRamp SheetsRecursive Language Models (Alex Zhang)RetoolSelf-maintaining Ramp SheetsSteer AIWhere to find Alex:LinkedInTwitter/XWebsiteWhere to find Harrison:LinkedInTwitter/XWhere to find LangChain:WebsiteDocsSend feedback or questions to maxagency@langchain.devTimestamps:(00:00) Introduction(01:13) The origin of Ramp Sheets(02:27) The Loom-to-Markov-diagram process mining pipeline(04:28) Why code gen approaches felt too "black box" to finance(06:13) Meeting finance where they already are: inside the spreadsheet(09:08) How far process mining got them(10:31 )Text descriptions and Graphviz DAGs as output(12:41) Under the hood of Ramp Sheets(14:52) Why the agent uses Python only as an escape hatch(15:47) Why Anthropic models excel at agentic spreadsheet manipulation(17:12) Frankensteining the OpenAI Agents SDK(17:43) The Ramp Sheets UX and fast vs. expert mode(19:58) Agent in a sandbox vs. agent with a sandbox(21:55) Vibe evals with expert humans(23:40) Inspect, the internal coding agent(24:13) The self-monitoring loop and auto-PRs(28:01) Other wacky experiments on Sheets(28:43) Memory experiments that didn't pan out(31:16) Latent Briefing and KV-cache subagent communication(35:13) Reviving Golden Gate Claude(37:47) Contrastive pairs and steering vectors(39:47) Picking the right layers in Gemma(41:37) What Ramp Labs looks for when hiring
How Listen is building a system of AI Agents & subagents for specialized tasks | Florian Juengermann, CTO 23.04.2026 47min

Florian Juengermann is the co-founder and CTO of Listen, an AI startup that turns qualitative research across hundreds of interviews, surveys, and focus groups into structured, traceable insights. Listen's agents analyze responses at scale, and Florian has rearchitected the system multiple times to get there. In this conversation, he walks through the virtual table architecture at the core of their Research Agent, how small models run map-reduce classification across thousands of open-ended responses, and the self-reviewing feedback subagent that catches errors during long async runs.We also discuss:The three agents inside Listen's platformHow Listen rearchitected from a simple RAG bot to a multi-agent system multiple timesWhy the PowerPoint subagent was completely rebuilt using Claude's code SDKContextual prompt engineering as an alternative to skillsHow Listen keeps report numbers live as new interview responses come inWhen to trigger the long-running agent vs. showing early resultsWhat Florian looks for when hiring agent engineersReferences:AnthropicChatGPTClaudeClaude Code SDKE2BEmotional IntelligenceGPT MiniHaikuListenOpenAIPandasPostgresPythonResearch AgentRenderZoomWhere to find Florian:LinkedInTwitter/XWhere to find Harrison:LinkedInTwitter/XWhere to find LangChain:WebsiteDocsSend feedback or questions to maxagency@langchain.devTimestamps(00:00) Introduction(01:25) The three agents inside Listen's platform(03:15) Live chat vs. long async runs, and how Listen tunes for each(05:33) Under the hood of the Research Agent(06:37) Listen's virtual table architecture(07:34) How small models classify thousands of open-ended responses(10:05) Running code in a sandbox: how E2B fits in(11:52) Why Listen rebuilt the PowerPoint subagent from scratch(14:11) Contextual prompt engineering instead of skills(16:32) The feedback subagent that reviews its own reports(18:14) How Listen runs evals in production(19:47) Unexpected ways users push the agent to its limits(21:42) How many times Listen has rearchitected, and why(24:59) Trace observability: depth over breadth(26:10) Lessons from running Claude Code SDK inside E2B(27:42) Memory: what's solved and what isn't(29:10) The Composer agent UX: co-editing a document with AI(35:50) How Listen keeps report numbers live as new responses come in(43:47) What Listen looks for when hiring agent engineers
How Hex builds AI agents that reason like human data analysts | Izzy Miller, AI Engineer 09.04.2026 1t 8min

Izzy Miller is an AI engineer at Hex, an AI analytics platform that was one of the first companies to ship data agents to real paying users. Today, Hex runs a multi-agent system with nearly 100K tokens of tools, and Izzy is building a 90-day simulation to evaluate whether those agents actually get smarter over time. In this conversation, he walks through the harness decisions that shaped their architecture, the failure modes Hex is seeing at scale, and what it takes to build an eval that no current model can pass.We also discuss:Why data agents are harder to verify than coding agentsUnder the hood of Hex’s agentsHow Hex is unifying separate agentsWhy most eval sets are badThe 90-day simulation for long-horizon evalsHow Izzy went from marketing to AI engineerReferences:Andon LabsAnthropicBarry McCardelChatGPTClaude CodeClaude Sonnet 4.6DBTGPT-3.5 TurboGPT-5.3 Codex SparkGPT-5.4HexLangChainLangSmithLookerOpenAIOpus 4.6Satya NadellaSnowflakeVending MachineWhere to find Izzy:LinkedInTwitter/XWhere to find Harrison:LinkedInTwitter/XWhere to find LangChain:WebsiteDocsSend feedback or questions to maxagency@langchain.devTimestamps:(01:35) Where Hex's notebook agent started(03:46) The moment Hex knew it was time for agents(07:36) Why data agents are harder to verify than coding agents(09:30) How Hex is unifying separate agents(13:28) Under the hood of the notebook agent(15:41) The harness features that are now holding the agent back(17:41) Why Hex built their own orchestrator(18:59) Managing nearly 100K tokens of tools(20:49) Ephemeral queries and agent behavior trade-offs(24:46) The UX problem with showing agents' thinking(27:28) Why verification is harder than transparency for data agents(31:00) Memory, context conflicts, and collapse modes(34:38) How Hex built their internal eval system(39:29) Why most eval sets are bad(44:30) The 900% quota eval that every model fails(46:55) Model upgrades and the "in distribution" debate(51:34) How Izzy went from marketer to AI engineer(59:59) The 90-day simulation for long-horizon evals
Welcome to Max Agency 08.04.2026

Welcome to Max Agency, the podcast that goes deep into how the best agents are being built by builders like you. I'm Harrison Chase, CEO of LangChain, the agent engineering company, and I'll be your host.