The Information Bottleneck

Ravid Shwartz-Ziv & Allen Roush

Maa Yhdysvallat

Genret Teknologia, Tiede

Kieli EN

Jaksot 42

Viimeisin 27.07.2026

Two AI researchers, Ravid Shwartz-Ziv and Allen Roush, discuss the latest trends, news, and research in Generative AI, LLMs, GPUs, and Cloud Systems. The podcast covers cutting-edge developments in artificial intelligence and machine learning, offering insights from experts in the field.

Jaksot

The Model Found a Way Out - with Florian Brand (Prime Intellect) 27.07.2026 57min

Florian Brand builds evals at Prime Intellect. The premise of the conversation is that writing a benchmark is the easy part now. Keeping the model from cheating it is the job, and it takes longer than the benchmark itself.We get into why he thinks you can't evaluate a model apart from the CLI it runs in, what happens to statistics when a single run costs five figures, and whether the feeling that a model just works can ever become a number.He also has a few stories about agents finding their way around the scoring that are worth hearing cold.Timeline00:13 Intro01:00 What evals are for04:05 Agentic benchmarks07:10 Kimi K2 and model diversity08:23 Long-horizon coding tasks10:29 Building a benchmark12:15 MirrorCode14:27 Rubrics and LLM judges16:30 The cost of expert labelers17:49 Long runs and variance19:44 Evaluating the harness24:29 Chinese labs building CLIs30:00 More reward hacking37:45 Tau-bench and economic tasks39:43 Benchmaxxing and GLM 5.245:15 Statistics and cost47:56 Frontier convergence52:04 Misuse in open and closed models55:35 Self-improvementMusic"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.AboutThe Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
Pierre-Carl Langlais on Building Models from Data You Can Account For 23.07.2026 1t 5min

Most labs build language models by scraping the web and filtering afterward. Pierre-Carl Langlais runs it the other way around. At Pleias, the French-German lab he co-founded, the models are built from data he can actually account for, which in practice means open and public-domain sources plus a lot of synthetic data the lab generates itself. It sounds like a self-imposed handicap. It mostly isn't. One of their models is a 600 million parameter system that runs live inside the Paris subway's monitoring pipeline.We cover the SYNTH pretraining dataset and why he thinks "ethical data" has to mean more than copyright-free. He explains why barely 2% of their Common Corpus appears in typical web crawls, and why that gap is really a preservation problem. From there, he gets blunt about benchmark maxing and whether GLM really earns its Opus-class reputation. He also argues that the quiet move by closed labs to hide reasoning traces is mostly about claiming ownership of model outputs. He's skeptical of sovereign AI, and not shy about how Mistral drifted from frontier research toward French corporate consulting. We finish on NVIDIA's persona datasets and the odd idea of training on the conditions that produced a text rather than the text itself.Timeline(00:02) Welcome and introductions(00:49) Why synthetic data matters, and the SYNTH set(04:15) Three reasons to control your training data(07:18) What "ethical data" actually means(11:08) How Common Corpus got built, from Wikipedia to PDFs(16:35) Agentic harnesses and synthetic data(20:03) Evaluating data when you train on reasoning traces(25:27) General versus specialized pretraining(27:08) Benchmark maxing and the GLM question(31:51) Getting diversity in, and the NVIDIA personas(35:02) Hidden reasoning traces and the fight over model IP(38:17) Mid-training and the "It's All Training" thesis(41:47) Can small models actually compete(45:01) Cybersecurity and Europe's strategic gap(47:08) Do you need a big model to orchestrate the small ones(52:08) Sovereign AI and the limits of national champions(56:42) Scaling laws when you control the data(01:00:41) The NVIDIA persona datasets(01:04:52) What you actually do with synthetic personas(01:08:22) Closing thoughtsMusic"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.AboutThe Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
Dhruv Batra: The Browser Is a Robotics Problem - From Embodied AI at Meta to Web Agents at Yutori 20.07.2026 1t 7min

Dhruv Batra spent years leading Embodied AI at Meta, training virtual robots to navigate photorealistic 3D scans of real buildings with pure reinforcement learning. Then he left to co-found Yutori and build agents for a very different environment: the web browser.In this episode, Dhruv explains why he sees these as the same problem. Web agents, in his framing, are robots that act in a browser (pixels in, actions out), and the web turns out to be just as messy an environment as the physical world.Along the way, we cover his definition of intelligence as "navigation in idea space," why robotics is lagging LLMs, the sim-to-real gap and why you can't fake friction coefficients, the teleoperation counterexample to the "it's a sensor problem" argument, and his provocative claim that under the current paradigm, we solved machine learning and didn't even realize it. He also makes the case for why the scaling hypothesis isn't falsifiable, why JEPA-style arguments deserve to be grappled with, how Yutori trains its Navigator models with RL on live websites, and what happens to the ad-supported web when agents, not eyeballs, do the browsing.Timeline00:01 — Intro00:54 — What embodied AI actually means06:47 — Intelligence as navigation in idea space13:26 — Habitat: training robots with pure RL, no maps20:04 — Why robotics is behind LLMs28:24 — Sim-to-real: what you can and can't fake33:34 — "We solved ML and nobody noticed"37:12 — Leaving Meta, founding Yutori43:21 — Web agents: screenshots in, actions out48:15 — Why the web won't rebuild itself for agents53:32 — Training Navigator: RL on live websites1:01:04 — Who pays for the web when agents browse?1:09:17 — What Yutori means, closing thoughtsMusic"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.AboutThe Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
How to Turn Research Into Billion-Dollar Companies, with Ion Stoica 16.07.2026 49min

Ion Stoica has done what almost no academic ever does — repeatedly turned university research into billion-dollar companies. He co-founded Databricks (now valued at over $100 billion), Anyscale, Arena AI and Conviva, while his Berkeley lab produced the open source projects the entire AI industry runs on: Ray, vLLM, and SGLang.In this episode, we ask him how it's actually done. His answer is surprisingly unromantic: solve a problem people already care about, build an artifact good enough that they adopt it, and pay attention to the moment users start asking "who maintains this after the students graduate?" - that's when a project becomes a company. He's also insistent that the credit belongs to his students.From there, the conversation goes deep into what he's watching now: why the AI stack has become an order of magnitude more complex than the Hadoop/Spark era, why maximizing GPU utilization is "the name of the game" for any enterprise, and why coding agents will struggle with distributed systems long after they've mastered web apps. He shares a memorable reward-hacking story — a load balancer that maximized throughput by dropping requests — explains why the gap between open and closed models sits at about six months, and closes with his case for regulating AI by outcomes, not capabilities.Timeline00:00 — Introduction: welcoming Ion Stoica01:21 — The playbook: how research projects become companies05:22 — Will vLLM and SGLang stay open source?07:47 — The real bottleneck in the AI stack: complexity, not just hardware14:31 — Should algorithms follow infrastructure, or the other way around?16:13 — Can AI coding tools write distributed systems and GPU kernels?21:09 — Verifiers, harnesses, and the limits of outsourcing understanding25:41 — Reward hacking: the load balancer that dropped requests25:58 — How should enterprises consume GPUs? Utilization as the name of the game30:23 — GPU scarcity: will the compute crunch ever end?35:27 — Hyper-optimization and the risk of locking in today's architectures37:17 — Open vs. closed models: why every company wants to own the stack40:35 — The six-month gap, and the rising cost of training frontier models43:58 — Kimi, Qwen, and who's incentivized to keep open models alive45:39 — Regulation: outcomes, not capabilities47:41 — Self-regulation, concentration of power, and auditing open models48:32 — Wrap-upMusic"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.AboutThe Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
Kaggle Grandmasters, Agent Skills, and Why Everyone Is Overfitting with Jean-Francois Puget (NVIDIA) 13.07.2026 59min

Jean-Francois Puget is a Director and Distinguished Engineer at NVIDIA, where he leads the Kaggle Grandmasters team, and he's ranked third on Kaggle's all-time list. We caught him on the day NVIDIA announced Nemotron Ultra and its new agent skills repo. We talk about what skills actually are, why they beat MCP tools on context cost, and how NVIDIA built an evaluation pipeline to separate skills that help from skills that don't.From there we talk about the thing JFP cares about most: evaluation. He explains why most LLM benchmarks reward overfitting, how his team discovered O3 could pick the right files to fix SWE-bench issues without reading them, and why the only benchmarks he trusts are the ones where you commit before you see the score, which is exactly how Kaggle works. He predicts a "bloodbath" for the wave of competitors letting coding agents chase leaderboard scores with no notion of validation.We also get into what coding agents are actually good for ("a mix of a genius and a dumb person"), the multi-agent system at NVIDIA that built a working PyTorch clone that runs 10x slower than the real thing, his unfiltered take on frontier lab PR and the Mythos release, whether AI is a bubble, and the story of how his team won ARC-AGI with a 4-billion-parameter model at 20 cents a task, including jumping from third to first in the final hours of a seven-month competition.Timeline00:00 — Intro01:05 — NVIDIA's announcements: Nemotron Ultra and the agent skills repo07:21 — Skills vs MCP tools, and progressive disclosure10:24 — Agents that write their own skills: a new form of learning13:33 — When overfitting is fine (and when it isn't)15:47 — Why most LLM benchmarks reward overfitting17:06 — The SWE-bench contamination story: O3 picks files without reading them19:45 — How LLMs changed Kaggle, and the coming "bloodbath"25:40 — What makes a good data scientist: evaluation and one-bit experiments28:56 — Running Codex at scale: the top token consumers at NVIDIA29:37 — Did coding agents kill AutoML?30:16 — Genius and dumb at once: the limits of coding agents35:21 — Humans in the loop, sandboxing, and the teenage hacker who never wrote code37:42 — Mythos, frontier lab PR, and open source40:08 — Why NVIDIA builds open models, and where it's already frontier43:48 — World models, robots, and the coffee test49:20 — Why agents still can't play Dota50:24 — Is AI a bubble?53:14 — Winning ARC-AGI with a 4B model at 20 cents a task57:39 — Kaggle is a legal drugMusic:"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
AI Agents and The Golden Age of Asking Questions with Dimitris Papailiopoulos (MSR/UW-Madison) 09.07.2026 1t 13min

In this episode, we talked with Dimitris Papailiopoulos, researcher at Microsoft Research's AI Frontiers lab and professor at the University of Wisconsin, about doing research in the age of agents. Dimitris told us about the Sunday morning that changed how he works: he handed Claude Code and Codex a question he'd been sitting on for years, went about his day, and came back to an answer. After a few days of dread about what's left for humans, he landed somewhere more optimistic, calling this the golden age of asking questions.We talked about his "smallest transformer that can add" leaderboard, a symbolic GSM8K solver built from if-else statements, and what happened when he put two Claude Code instances in the same file system and told them to do something cool (one pair invented a communication protocol, the other played Battleship). We also got into diversity and slop in agent-generated ideas, why agents get stubborn after a million tokens, harness overfitting on Terminal-Bench, continual learning and world models, whether agents need vision, and where information theory actually helps in AI and where it's a katana used to make coffee.Timeline00:00 Intro01:45 How agents changed the way Dimitris does research04:30 A Sunday morning with Claude Code, Codex, and GSM8K07:15 The dread, then the golden age of asking questions08:20 Taste and verification, and how we train students now09:53 Will models make human verification obsolete?11:30 The smallest transformer that can add 10-digit numbers13:40 Humans as initializers for gradient descent in idea space15:32 Allen on diversity, slop profiles, and high temperature research21:44 When Claudes meet: Battleship, invented protocols, and a grokking paper25:53 Single agent vs multi-agent under fixed compute30:28 Auto-research benchmarks and what agents actually accelerate35:14 Inside the symbolic GSM8K solver (with a live progress check)40:04 Idea overfitting and why agents refuse to change course44:00 Learning from failure traces and harness overfitting48:04 Continual learning, memory files, and world models51:30 Why don't labs personalize models on your own history?57:52 Agent-to-agent communication: is Jira the right tool?1:01:25 Multimodality: vision as a tool vs one unified model1:05:40 Information theory and AI, or making coffee with a katana1:11:23 Closing thoughts: ask bigger questionsMusic:"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
Why All Models Learn the Same Thing with Phillip Isola (MIT) 02.07.2026 1t 11min

Phillip Isola, professor at MIT, joins us to talk about representation learning: what makes a representation good, why different models seem to converge on similar representations, and whether pre-training is really over.We discuss the platonic representation hypothesis and its limits, why clustering structure matters more than global geometry, and Phillip's new neural thickets paper arguing that post-training is easier than people think because pre-trained weights already sit near solutions to downstream tasks. Phillip also explains why he thinks LLMs are already world models, why he's betting on RNNs making a comeback, and why his most exciting current direction is artificial life: putting LLM agents in open environments with no fixed task and studying them like new organisms.Timeline:00:00 Intro song00:13 Intro01:05 What is representation learning and why it matters04:09 What makes a representation good: minimality and sufficiency10:03 How cross entropy and contrastive learning shape representations14:35 Dimensionality reduction and why dimension isn't the right complexity measure16:35 Compression and geometric clustering during training19:27 The platonic representation hypothesis and what actually converges22:53 Local neighborhoods vs global structure: the Aristotelian follow-up24:33 When convergence is strong: truth vs the space of possibility28:09 Is there true similarity in the world? The Bouba-Kiki effect30:56 World models vs autoregressive LLMs32:14 Diffusion LLMs as a special case of autoregressive models33:42 What architectures win in five years: the case for RNNs36:11 Grad student descent, or do we actually have principles?40:51 Feathers and wings: what to take from biology43:17 How close are we to brain-like models? Marr's three levels47:01 Are better models becoming less human-like?49:38 Is pre-training all you need? The neural thickets paper54:18 LoRA, low rank fine-tuning, and why post-training is easier than we thought56:01 RL environments and what our benchmarks actually test1:01:11 Artificial life: LLM agents as new organisms1:07:20 What's overlooked in AI research right now1:08:36 Why stay in academia, and doing science in the age of OpusMusic:"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
AI for Science with Qichao Hu (Molecular Universe / SES AI) 29.06.2026 1t

Most AI-for-science companies are selling shovels. Qichao Hu wants the gold.In this episode, we talk with Qichao, the founder and CEO of Molecular Universe, the AI-for-science platform that grew out of SES AI, a high-energy-density battery developer he's run for fourteen years. His core distinction is that companies from the AI world build tools, such as foundation models that predict properties, while companies from the science world care about the final product, such as the new battery or material that actually ships. Molecular Universe sits firmly on the science side, and the difference shows up everywhere from what they publish to what they refuse to.We get into the actual workflow of materials discovery and where AI compresses it. A single trial in a traditional lab can take a year with maybe a 40% success rate; the goal is to run a thousand candidates in parallel and turn that year into a week. Qichao walks through improving low-temperature fast-charging for EV batteries: from hypothesis generation through molecule-, material-, and device-level property prediction, down to autonomous labs that synthesize and test the top candidates without a human touching a pipette.The hardest problem, it turns out, isn't predicting molecular properties or measuring device performance, but it's the black box connecting the two. In batteries, that's the solid-electrolyte interface, which the field has been hand-waving about since the seventies. And the thing standing in the way of cracking it isn't a clever training trick but data: companies sitting on twenty years of records are finding it too messy, incomplete, and poorly labeled to train on, and are having to start collecting from scratch with new protocols and robots.Timeline00:13 — Intro and welcome;01:19 — Shovel vs. gold05:18 — Why the world's smartest scientist doesn't automatically give you a better battery07:25 — The discovery workflow09:37 — Exploration vs. exploitation11:54 — Safety and filtering: screening novel molecules against banned and toxic-substance lists17:55 — How hypotheses get generated, and where frontier LLMs help20:29 — From hypothesis to ~400 formulations: property prediction, ranking, and handing off to autonomous labs26:37 — "A foundation model for everything" — and the black box between molecular properties and device performance30:01 — World models and physics33:09 — The great unknown in batteries37:08 — Simulation vs. reality: calibrating massive simulated datasets with a sliver of experimental data41:47 — Lab robotics: how fast the hardware has caught up, and what a floor of autonomous labs looks like43:50 — The real bottlenecks50:21 — Pre-training from scratch vs. post-training LLMs, and why training tricks haven't reduced the need for good data52:42 — Evaluation55:42 — Publish the B+ model, keep the A model58:05 — Five years out1:00:37 — Closing thoughts and wrapMusic:"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
Infrastructure for AI at Scale - With Benny Chen (Fireworks AI) 24.06.2026 1t 5min

We talk a lot on this show about RL, agents, and the move between pre-training and post-training, but not enough about the layer everything actually runs on. Benny Chen, co-founder of Fireworks AI, one of the largest inference platforms around, walks us through what it takes to serve models at scale: sourcing GPUs, writing the kernels, the runtime, and the routing layer that lets a customer hit one endpoint and forget the rest.We talk why the real bottleneck is power, not chips, and why that favors Nvidia and Google. Why MoE keeps winning even when dense models look better on paper and why he'd rather run fungible capacity at 95% than specialized chips at 60%. We also talk about quantization limits, where RL efficiency has to go next, and his case that AI is still under-hyped. We also get into cross-region training, sparse autoencoders and why interpretability hasn't taken off in open source, whether open models can close the gap, and a frank read on Anthropic's go-to-market.Timeline00:00 — Intro: the part of AI nobody talks about01:20 — What "infrastructure for AI" actually means: the layers, from GPUs up to routing02:59 — Why not just buy your own GPUs and do it yourself?05:17 — The scale Fireworks runs at06:35 — Hardware inflation, GPU costs, and the real risk hiding in commit duration10:14 — Nvidia vs AMD vs TPUs, and why power is the bottleneck11:57 — Mixing GPU types and generations; fungibility vs. specialization14:22 — Once you have the GPUs, what's the next layer to build?17:04 — Dense vs. MoE, and why the hardware picks the winner21:07 — Quantization: is FP4 the floor? TurboQuant and INT vs. FP24:28 — How tied are the algorithms to the hardware?25:12 — DeepSeek, DeepGEMM, and next-token prediction as reconstruction loss28:50 — Why RL is still wildly inefficient compared to pre-training30:08 — Speculative decoding, AI-generated kernels, and auto-research34:00 — The AGI question: why text gets automated but vision may stay expensive37:07 — Hype check: why Benny thinks AI is still under-hyped41:28 — Training vs. inference at the infrastructure level44:12 — Scaling across data centers: cross-region training with Cursor45:40 — Sparse autoencoders, interpretability, and why open source is human-constrained49:04 — Will open models catch up — on quality and on compute?51:41 — Are we plateauing? Opus 4.7 vs. 4.6 and the coming data wars54:41 — Physical limits, HBM, and whether chips keep getting faster58:17 — The belief about inference everyone gets wrong59:31 — Anthropic, mythos, and a frank take on go-to-market1:04:41 — Wrap-upMusic:"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
Broken Peer Review, AI, and Worms — with Oded Rechavi 21.06.2026 1t 18min

Oded Rechavi is a biologist at Tel Aviv University and the co-founder of QED, a company building AI to review scientific work. He's also spent years studying worms.We start with what's wrong with peer review and grant funding: why it takes years to publish, why reviewers are often your own competitors, and why the whole thing is locked to an economic model that rewards publishing more papers, not better ones. Oded explains why he doesn't call QED "peer review" at all, and what it would take to actually validate science instead of just stamping it.Then we get into the biology. C. elegans has exactly 959 cells, every one of them named, and a fully mapped brain. Oded's lab studies how a worm's experiences get passed to its offspring through RNA rather than DNA — meaning what happens to a worm in its lifetime can change its descendants. We also talk about using ancient DNA to reassemble the Dead Sea Scrolls, what AI can and can't do for biology, and why he wants to build an "Ironman suit" for researchers rather than replace them.00:00 Intro01:35 Why scientific publishing is broken04:02 Years to publish, and what it costs science07:20 Bad reviewers, conflicts of interest, and the money10:47 Why preprints don't fix it15:37 How AI conferences handle review22:07 Conferences vs. journals — does slow review help?25:22 Building QED: review, not peer review30:02 Tracking a paper from idea to submission33:11 What writing a grant actually involves35:00 The ERC reviewer crisis37:06 Tailoring feedback to your field41:48 Switching to biology44:30 Every cell has a name: inside C. elegans46:28 Inheritance without DNA48:16 What the worm "thinks" changes its offspring51:58 Reassembling the Dead Sea Scrolls with ancient DNA56:07 Psychedelics and worms58:36 Can AI run the research itself?1:04:49 Automation vs. validation1:07:12 The origin of life1:08:49 Why people reject AI-written work1:16:18 Will humans still have a role?1:17:39 Wrap-upMusic:"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
Will AI Take Our Jobs? With Alex Imas (Google/University of Chicago) 16.06.2026 1t 29min

Will AI take our jobs? We put the question to Alex Imas, the new Director of AGI Economics at Google DeepMind and a professor at Chicago Booth, whose entire job now is studying how frontier AI reshapes the economy. His short answer: probably some of them, but the popular story is mostly wrong about which jobs and how fast.Alex makes the case that a job is a bundle of tasks, not a single thing AI either does or doesn't do, and that the number of people who should actually care about is how much consumer demand responds to falling prices. Get that wrong and you predict mass layoffs. Get it right and you sometimes predict more hiring. We get into why the automation panic is two centuries old, why he thinks blue-collar work is in more danger than white-collar, and why the people already winning are the ones adopting AI fastest.We also cover the AGI versus ASI distinction and why it changes everything for the economy, what happens when there's no moat and open models stay six to eight months behind, the three-tier pricing future he sees coming after the 2026 compute crunch, and what any of this means if you're deciding whether to send your kids to college.The episode was recorded before Alex joined GoogleTimestamps00:00 Meeting Alex Imas00:44 Will AI take our jobs?03:35 Is this an AI question or an economics question?06:18 The economy is already behind the AI we have07:43 Why AI adoption is K-shaped12:51 Was Andrew Yang right?13:45 The automation panic is 200 years old16:46 Dario's six-month claim, and why we don't see it yet17:22 A job is not a task22:38 The three numbers that actually predict the labor market22:42 The chess engine analogy and the centaur phase25:45 Recursive self-improvement and the hamburger problem30:06 Should AI labs be the ones answering alignment questions?31:17 The "invisible hand wave" and why nobody wants fully autonomous AI33:27 AGI vs ASI, and why the difference is everything35:28 Commodities vs relational goods41:14 Star Trek, replicators, and predicting with sci-fi45:20 Inequality and the Upper West Side VCs46:21 Your money manager was automated in the 1960s50:47 Are OpenAI and Anthropic overvalued? The moat problem54:29 What has to be true for the losses to make sense55:43 Cognitive atrophy and monopoly fears57:00 The 2026 compute crunch and the three-tier pricing future1:01:52 The Apple vs Android analogy1:03:54 A rich-country perspective1:04:16 Protecting the skills that actually matter1:07:02 Will not using AI become a status symbol?1:08:53 Does capitalism even survive?1:13:44 Redistribution becomes the political battleground1:18:16 Blue collar vs white collar: who's really at risk1:21:18 Advice for parents in an AI world1:22:43 Saving for retirement when the Valley says don't1:25:06 Will non-elite colleges survive?Music:"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
Why AI Benchmarks Are Lying to You - with Wenhu Chen (Meta/University of Waterloo) 13.06.2026 1t 19min

In this episode, we sit down with Wenhu Chen, research scientist at Meta MSL, assistant professor at the University of Waterloo, and the person behind MMLU-Pro and MMMU. If you've read a frontier model release in the last two years, you've seen his benchmarks. That makes him one of the best people to answer the question everyone dances around: when a model jumps from 40% to 90% on your benchmark, how much of that is real? In this episode, we dig into why benchmarks have become the loss function of the entire field - design a bad one, and thousands of brilliant researchers will spend months hill-climbing in the wrong direction. Wenhu is surprisingly candid about the limits of his own creations: contamination is everywhere, saturation turns frontier benchmarks into unit tests, and popular alternatives, such as LM Arena, mostly measure tone and length rather than capability. His answer is to evaluate models where they've never been: private codebases, hospital data, and the messy, live internet.We also talk about ClawBench, his new benchmark that deploys agents to over 140 real production websites to do things people actually want done, such, such as ordering food, booking tickets, and applying for jobs. The best model in the world completes about a third of these tasks. We unpack why: bot detection, models that refuse to click "pay," agents that give up the moment an environment doesn't match their training, and harnesses that can swing results by 20% without changing the model at all.Along the way, we cover the overlooked science of evaluating pre-training, data flywheels, and synthetic environments for agent training, and whether RL teaches models to reason or just surfaces what's already there. We close with Wenhu's predictions: exploration and adaptability will improve rapidly, but security will become the field's hardest problem as agents gain real permissions in the real world.Timestamps00:00 – Intro00:55 – What good evaluation means, and how it's changed since the early GPT days03:35 – Benchmarks as the field's loss function05:50 – Contamination: the problem nobody fully solves08:08 – MMLU-Pro scores: real progress or training on the test set?11:05 – Can you measure creativity?12:34 – Why human judges and arenas are unreliable — and what to use instead19:22 – What a good benchmark actually looks like22:34 – Chain of thought: signal or scratchpad?26:01 – Auto-research and hill-climbing agents28:52 – Harnesses: 20% swings without touching the model32:28 – Safety, model release, and an "FDA for models"36:53 – The overlooked science of pre-training evaluation43:49 – Designing pre-training benchmarks when one run costs a billion dollars49:45 – ClawBench: agents on 140+ live websites, and why the best model gets 33%54:42 – How MMLU-Pro and MMMU-Pro were born from public complaints59:16 – Pixel agents vs. APIs: will MCP kill computer use?1:02:11 – Training agents: data flywheels and synthetic environments1:05:43 – SFT vs. RL, and does RL teach reasoning or reveal it?1:09:21 – What gets solved next year — and what doesn't1:14:32 – Undervalued ideas, and what's next for ClawBenchMusic:"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
Jürgen Schmidhuber - Part 2: JEPA, the Road to AGI, and Who Really Invented Modern AI 07.06.2026 1t 29min

In the second half of our conversation with Jürgen Schmidhuber, we focus on the key ideas he's pursued since the early 1990s and discuss why he believes these concepts are only now being rediscovered.We start with JEPA. Jürgen argues that the method LeCun named in 2022 is the same family he published in 1992 as Predictability Maximization. From there he traces the adversarial lineage back further still, to his 1990 world-model paper and 1991 Predictability Minimization - the curiosity-driven minimax games he sees as the real origins of GANs.We also talk about why these ideas took thirty years to land, why today's trillion-dollar data-center buildout is driven by AGI fear, and why he thinks Apple may come out ahead.The back half turns to what he sees as the real frontier: physical AI. Today's systems are superhuman behind the screen but helpless at a leaky pipe, and until a robot can use human tools, there's no AGI. He discusses self-replicating, self-improving machines as "a new kind of life," reframes continual learning and test-time training as ideas from his 1991 fast-weight work, and detours through Solomonoff's universal prior, Hutter's AIXI, and the Gödel machine.We close on the subject Jürgen is famous for: scientific credit. He makes his case for rigorous attribution, casts himself as a "speaker for the dead" championing forgotten pioneers like Ivakhnenko, and reflects candidly on whether the fights are personal.Timeline00:30 — What JEPA is, and the 1992 Predictability Maximization story 04:54 — Implementing PMAX: autoencoders, Siamese networks, Infomax 09:10 — Predictability Minimization, factorial codes, and the roots of GANs 16:00 — Why it took 30 years: the economics of compute 20:52 — Data, the web, and 1990 as the origin point 23:09 — Hardware inflation, the trillion-dollar buildout, and the coming crash 34:05 — Physical AI: the plumber problem and self-replicating machines 41:14 — Which 90s ideas are being scaled right now 45:26 — Continual learning and test-time training as "old hats" 55:19 — Measuring intelligence: Solomonoff, AIXI, and the Gödel machine 1:05:26 — Self-replication and von Neumann 1:09:51 — Will he see AGI in his lifetime? 1:10:42 — Credit, integrity, and being a "speaker for the dead" Music:"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0."Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.Changes: trimmedAbout: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
Jürgen Schmidhuber - World Models, RL, and the Year that changed AI (Part 1) 04.06.2026 1t 37min

In this episode, we host Jürgen Schmidhuber - the man, the legend, one of the godfathers of modern AI. His lab worked out many ideas behind today’s systems (LSTM, world models, artificial curiosity, Transformer variants, and even GAN-style setups) decades before they became fashionable, and he’s just as well known for making sure people remember who did what first. This is the first of two conversations with him.We go back to his lab in the early 90s and ask how one small group came up with so many of the ideas that are now being scaled to a thousand billion dollars, back when compute was ten million times more expensive. A lot of the episode comes down to one distinction he keeps making: prediction vs. decision-making. His take is that LLMs are very good prediction machines that imitate the web, but that’s only half the problem. To actually act in the world, you need a controller that uses a world model to plan. He talks about his 1990 work on world models and artificial curiosity, where the controller gets rewarded for running experiments that improve its own model (an adversarial setup years before GANs), why planning millisecond by millisecond doesn’t scale, and why you need sub-goals instead.We also talk about compression as the core of understanding, from falling apples to Kepler to Einstein, and why we still don’t have a robot that can do what a plumber does, even though the AI behind the screen keeps getting better. Then the conversation moves to credit assignment: how “to Schmidhuber” became a verb, what he thinks is broken about the award system, and a long exchange on PMAX vs. JEPA. He ends on the real origins of deep learning and a prediction about self-replicating machines in space.Timeline00:00 Intro00:55 1991 in Munich, and why that lab mattered02:38 "I'm not very smart" and why compute getting 10× cheaper every 5 years changed everything04:25 Chess as an AI proxy08:27 Artificial curiosity in the 90s vs. today's RL exploration09:10 Why RL is harder than supervised learning20:48 Coding agents vs. robots, and how a baby learns its own hands26:20 Compression as understanding33:40 What's actually missing on the road to AGI37:30 Why millisecond-by-millisecond planning is stupid47:44 Convergence to LLMs, GPUs, and how far we still are from the Bremermann limit51:49 Unsupervised learning, factorial codes, and predictability minimization58:12 Credit assignment: the fights with LeCun and the Nobel critique1:02:13 On his last name becoming a verb1:05:17 The award system's missing peer review1:07:03 Closed labs and the decline of open research1:13:23 Audience questions1:34:02 Closing: who really invented deep learning?Music:"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0."Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.Changes: trimmedAbout: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
AI for Science and the Thermodynamics of Generative AI - with Max Welling (UvA, CuspAI) 29.05.2026 1t 13min

In this episode, we sit with Max Welling, Professor of Machine Learning at the University of Amsterdam, co-founder and CTO of CuspAI, and a foundational figure behind variational autoencoders (VAEs), equivariant networks, and Bayesian deep learning. We talk about AI for science, the physics underneath generative models, and what's still missing on the road to real intelligence.Max starts with what impresses him and what worries him about the LLM era, then makes the case that the next leaps will come from physical AI and from science itself. We dig into how machine learning actually works in the lab, world models and whether priors like geometry and symmetry should be built in or simply learned, and whether transformers will still rule a decade from now. At the end, we talk about CuspAI's climate mission, AI risk and regulation, Max’s new book, and where neuroscience might inspire the next wave of ML.Timeline00:00 — Intro00:47 — Are we happy with the LLM era?03:14 — Embodiment and physical AI08:05 — Does "AGI" even matter as a term?11:34 — Verifiers, RL, and why math/coding are tractable13:17 — What actually shifted to make materials discovery work14:42 — From molecules to biology and wet labs16:26 — Working with real labs: timescales, friction, and the "Mira" agent20:29 — Balancing simulators vs. experiments: the exploration–exploitation trade-off23:44 — Active learning for experimental design24:23 — Why active learning hasn't been central to LLMs25:24 — A general loop for ML-for-science across domains27:10 — Foundation models for chemistry: a "mother ship" plus a zoo of fine-tuned models30:04 — Quantum mechanics, interpretation, and AI as a creative theorist31:54 — World models and Yann LeCun's view; priors vs. learning34:57 — Should world knowledge be explicit? (responding to Stefano Ermon)36:41 — Vision: equivariance vs. transformers, and the role of optimization40:32 — Best model for molecular properties in 10 years? Will transformers survive?43:16 — CuspAI's climate focus and what motivated it47:10 — One platform for every material class — what transfers and what doesn't48:42 — Where does the risk of human extinction really come from?51:06 — The "pause AI" debate and the arms-race reality52:40 — Regulating powerful models: government vs. self-regulation55:16 — Who should design AI regulation? 56:29 — The new book1:00:31 — Compression, the information bottleneck, and renormalization1:03:30 — The role of foundational principles in modern AI1:04:06 — Waves in computing, the brain, and the next wave of innovation1:07:11 — Neuroscience and ML: are we in a better position now?1:09:17 — Conferences, the ICLR keynote, and finding the right peopleMusic:"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0."Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.Changes: trimmedAbout: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
After Math Falls, What's Next? with Julia Kempe (NYU/Meta) 25.05.2026 1t 14min

Julia Kempe on Why Math Will Fall Next, Superhuman Provers, and the Return of the Renaissance ResearcherIn this episode, we sit down with Julia Kempe, a Professor at NYU's Center for Data Science and researcher at Meta FAIR's Foundations of Reasoning team, for a wide-ranging conversation on the future of AI research.We dig into why verifiable domains like mathematics may be on track to "fall" the way Go did. With formal verification through Lean and the Mathlib infrastructure, LLM agents can now generate and check proofs at scale, and Julia makes the case that a new industry of automated mathematical discovery is closer than most mathematicians believe. We explore why Erdős problems are already falling, what's still missing for harder fields like analysis and physics, and how synthetic data, curation, and verification fit together.From there we get into the energy and scaling limits of frontier models, the case for academic research that big labs can't pursue, how to advise PhD students when Claude can already do their first-year work, the rise of AI safety and security as research priorities, and Julia's optimistic argument that AI tools are bringing back the Renaissance generalist - the researcher who can finally work fluently across math, biology, and beyond.Timeline00:00 — Introductions01:00 — Defining reasoning and verifiable domains04:00 — Lean, Mathlib, and the formalization of mathematics10:00 — Constructive proofs, Erdős problems, and the new wave of "AI mathematicians"14:00 — Will math be "solved"? Art, photography, and the changing nature of creative work18:00 — Why physics is harder than math22:00 — Moravec's paradox, evolution, and why robotics lags behind language27:00 — The Renaissance is back: generalist researchers in the age of AI29:00 — Advising students: math, programming, and what core education still matters32:00 — Teaching and assessment when GPT can do the homework35:00 — Anti-AI backlash, energy costs, and the security threat40:00 — Scaling vs. efficiency42:00 — Model collapse, synthetic data, and what's left to squeeze from the internet44:00 — What's exciting next: AI for science, safety, robotics, memory, and planning47:00 — Annotation costs as a proxy50:00 — Superhuman models and what security even means against them52:00 — AlphaGo as precedent for verifiable superhuman performance54:00 — Hallucination, the Mirage paper, and whether these are solvable problems56:00 — Why coding isn't fully solved yet58:00 — Agent security, prompt injection, and the Wild West of deployed agents1:01:00 — Regulation: what's needed and what's possible1:04:00 — Advice for PhD students and what research academia should pursue1:09:00 — Startup opportunities: robotics, security, and AI for finance1:12:00 — Closing thoughts: use the tools, and build grassroots AI for goodMusic:"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0."Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.Changes: trimmedAbout: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
Language, Cognition, and the Limits of LLMs - with Tal Linzen (NYU/Google) 17.05.2026 1t 23min

We host Tal Linzen, Associate Professor at NYU and Research Scientist at Google, for a conversation on the intersection of cognitive science and large language models.We discussed why children can learn language from around 100 million words while LLMs need trillions, and the surprising finding that as models get better at predicting the next word, they become worse models of how humans actually process language. Tal walked us through how his lab uses eye-tracking and reading-time data to compare model behavior to human behavior, and what that reveals about prediction, working memory, and the limits of current architectures.We also got into nature versus nurture and how inductive biases can be instilled by pre-training on synthetic languages, world models and whether transformers actually use the geometric structure they encode, the BabyLM challenge and data-efficient language learning, and what mechanistic interpretability can offer cognitive science beyond just fixing model bugs. The conversation closed on academia versus industry, the role of PhDs in the current AI moment, and how AI coding tools are changing the way Tal teaches and evaluates students at NYU.Timeline00:13 — Intro and what cognitive science means02:16 — Using computational simulations to understand how humans learn language05:26 — How children learn language vs. how LLMs are pre-trained07:53 — Why mainstream LLMs are not good models of humans 10:07 — Comparing humans and models with eye-tracking and reading behavior13:52 — Sensory modalities, smell, and how much you can learn from language alone16:03 — Animal cognition and decoding animal communication17:00 — Nature vs. nurture, inductive biases, and what transformers can and can't learn21:21 — Instilling inductive biases through synthetic languages 27:34 — The bouba/kiki effect and cross-linguistic sound symbolism28:33 — Latent causal structure in language and whether models discover it31:13 — Does knowing linguistics help build better models?35:07 — World models: what they mean, and why transformers encode geometry but don't use it39:13 — Tokenization, and why Tal doesn't like it41:35 — Scaling laws and the inverse-U curve of model quality vs. human fit44:34 — Where the human–model mismatch comes from: architecture, memory, and data47:08 — Diffusion language models and sentence planning48:21 — Data quality, synthetic data, and curriculum effects50:54 — Comparing models at different training stages to human development; BabyLM54:40 — What level of the model should we actually probe? Representations vs. behavior1:01:04 — Mechanistic interpretability, Deep Dream, and human dreaming1:02:11 — Cognitive neuroscience, intracranial recordings, and working memory1:10:31 — Should you still do a PhD in 2026?1:12:31 — Will software engineers lose their jobs to AI?1:17:43 — Teaching in the age of coding agents: what changes in the classroom1:20:54 — What's next: human-like LLMs as user simulators, and recruitingMusic:"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0."Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.Changes: trimmedAbout: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
Intelligence in an Open World - with Mengye Ren (NYU) 20.05.2026 59min

We talk with Mengye Ren, Assistant Professor at NYU's Center for Data Science, about what intelligence actually means once you step outside a benchmark, and why scaling a single centralized model isn't the whole story.We get into why intelligence has to be defined in open environments, not closed ones, and what that means for how we measure progress. We push on the creativity question: today's models sample bottom-up from a softmax or a Gaussian, with no internal loop of consideration, and as Mengye puts it, we haven't understood creativity yet and we're already prepared to hand it over.We also talk about what's missing for the next paradigm: continual learning, memory, embodied grounding, and smaller models that actually accumulate experience instead of re-deriving everything from scratch each call. Along the way, we get into JEPA and latent variables, biology as inspiration vs. blueprint, why frontier labs don't lean on explicit latents, the limits of synthetic data and world models, agent-to-agent communication, model uncertainty and forecasting, and whether ML education still matters when AI writes the experiments.A grounded, contrarian conversation about where AI research should be looking next, beyond benchmarks, beyond scale.Timeline00:00 — Intro and welcome01:24 — What is intelligence? Defining it relative to objectives and open environments04:19 — Is intelligence really the path to human flourishing, or is it productivity?04:57 — Safety, scalable oversight, and whether stronger models help or hurt06:09 — What does "alignment" actually mean?07:18 — Centralized vs. decentralized models: objectivity vs. personal meaning08:50 — Hinton vs. LeCun: where Mengye stands on AI risk10:29 — Bottom-up vs. top-down architectures and feedback loops21:28 — Biology and AI: inspiration, not blueprint24:14 — Biological plausibility, spiking nets, and where the analogy breaks25:39 — JEPA, Mamba, and architectures beyond the transformer27:31 — Language as a special modality: abstraction built for communication29:04 — Are we too locked into the current paradigm? Risk of creativity collapse30:09 — Synthetic data, simulation, and the brain's own generative models31:43 — World models and physical AI: how babies actually learn 33:03 — The case for smaller, continually learning models37:02 — The role of academic research in a frontier-lab world39:47 — Why LLMs aren't funny: the creativity gap40:35 — What research areas matter most: embodiment, continual learning, creativity42:05 — Creativity is bounded by experience — and why bottom-up sampling isn't enough45:35 — Agent-to-agent communication and the limits of sub-agents46:39 — Model confidence, epistemic uncertainty, and forecasting49:44 — Tokenization, static vs. dynamic worlds, and always-learning systems52:20 — Latent variables, JEPA, and why frontier models skip them53:40 — The future of ML education when AI writes the experimentsMusic:"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0."Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.Changes: trimmedAbout: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
The Principles of Diffusion Models - with Jesse Lai (Sony AI) 10.05.2026 55min

We host Chieh-Hsin (Jesse) Lai, Staff Research Scientist at Sony AI and visiting professor at National Yang Ming Chiao Tung University, Taiwan, for a conversation about diffusion models, the technology behind tools like Stable Diffusion, and most of the AI image and video generators you've seen in the last few years. Jesse recently co-authored The Principles of Diffusion Models with Stefano Ermon, and the book is quickly becoming a go-to reference in the field.We start with what a generative model actually is, and what it means to "generate" an image or a sound. Jesse explains the core idea behind diffusion in plain terms. You start with pure noise, and a neural network gradually cleans it up, step by step, until a realistic image emerges.From there, we talk about why diffusion has come to dominate so much of generative AI. Because the model builds an image gradually, you can guide it along the way, nudging the output toward what you actually want, refining details, or combining it with other controls. We also discuss the common critique that diffusion is slow and how the field has largely addressed it through new techniques.We zoom out to the bigger picture, too. Jesse shares his view on world models and whether diffusion is the right foundation for them. We talk about what makes a generative model genuinely good versus just good at gaming benchmarks, and why evaluating creativity and realism is so much harder than scoring a multiple-choice test.Timeline00:12 — Intro and welcoming Jesse00:47 — Why Jesse wrote the book, and who it's for03:29 — The three families of diffusion models, and why they're really one idea05:14 — What makes a good generative model07:39 — How do you even measure if a generated image is good08:59 — Why diffusion beats autoregressive models for images10:33 — Is diffusion still slow? How fast generation got fast11:12 — A simple intuition for what a "score" is14:12 — How the different flavors of diffusion connect under the hood14:42 — Diffusion for text and proteins17:12 — Consistency models and the push for one-step generation22:12 — Diffusion for world models: simulating reality in real time26:12 — Do world models need to understand language35:12 — Is diffusion the right tool, or just a convenient one38:12 — What benchmarks actually tell us, and what they miss46:12 — Closing thoughts and where to find the bookMusic:"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0."Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.Changes: trimmedAbout: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
Inside xAI, and the Bet on AI Math - with Christian Szegedy (Math Inc) 04.05.2026 1t 12min

We talked with Christian Szegedy, co-inventor of Inception and Batch Normalization, founding scientist at xAI, now at Math Inc, about what it takes to build a frontier lab, and why he left xAI to work on formal mathematics. Christian thinks Lean and auto-formalization are the missing piece for trustworthy AI: a machine-checkable layer underneath all reasoning, where proofs are guaranteed correct without anyone having to read them.We got into his bet with François Chollet that AI will hit superhuman mathematician level by 2026, and what that actually unlocks beyond math itself: verified software instead of vibe-coded apps that break when you refactor, AI systems you can actually trust because their reasoning is checkable, and a path to handling protein folding, chemistry, and parts of biology with real guarantees instead of hand-waving. Christian also walked us through how Math Inc's Gauss system pulled off a proof in two weeks that human experts had estimated would take another year.We also covered xAI's first 12-person year, why Christian no longer buys the original batch normalization story, why he's sure transformers won't be the dominant architecture in five years, what mathematicians do in a world of cheap proofs, and his take on whether humanity will handle AI well. He distrusts humanity more than he distrusts AI.Timeline00:12 — Intros: Christian's background (Inception, Batch Norm, xAI, Math Inc)01:29 — Building a frontier lab from scratch: the first 12 people at xAI04:15 — Hiring for proven track records when 200K GPUs are at stake06:07 — Elon's "dependency graph" and balancing long-term vision with investor demos07:28 — Gauss formalizes the strong prime number theorem in 2 weeks12:25 — What "formalization" actually means (and why it's not what most people think)14:39 — Why Lean gives 100% certainty and why that matters for RL15:26 — ProofBridge and joint embeddings across mathematical subfields 18:07 — Does math formalization transfer to coding and other fields?21:44 — Can every domain be mathematized? 23:14 — Verified software, chip design, and why vibe-coded apps are dangerous26:35 — Scaling Mathlib by 100–1000x28:27 — Artisan formalizers vs. invisible machine-language formalists33:26 — Can verification generalize?45:19 — Revisiting Batch Norm: covariate shift, loss landscape, and what really happens48:22 — Is normalization even necessary? 50:10 — What's actually fundamental in modern AI architectures51:41 — Why Christian thinks transformers won't last 5 years52:38 — The 2026 superhuman AI mathematician bet55:15 — What's missing: better verification + a much larger formalized math repository56:13 — Lean vs. Coq vs. HOL Light - does the proof assistant actually matter?59:26 — The role of mathematicians in 5–10 years1:02:00 — A human element to mathematics: Newton, Leibniz, and competitive proving1:03:25 — The telescope analogy: AI as the instrument that lets us see the math universe1:05:19 — Job apocalypse or Jevons paradox? 1:08:41 — Advice for students1:09:50 — Can we formally verify AI alignment? 1:11:52 — Closing thanksMusic:"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0."Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.Changes: trimmedAbout: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

Suosittu maassa

Tämä podcast esiintyy myös näiden maiden podcast-listoilla.

Unkari
Kenia

The Information Bottleneck

Jaksot

Samankaltaiset podcastit

Suosittu maassa