Most of the Web Will Never Get APIs for AI Agents | Dhruv Batra
Dhruv Batra, co-founder and chief scientist of Yutori and former head of embodied AI at Meta's FAIR lab, argues that most of the web will never expose APIs for AI agents. He explains why Yutori trains specialized browser agents to perceive pixels and click buttons the way people do, and why they run faster and cheaper than frontier models. Chain of Thought is hosted by Conor Bronsdon.
Most of the web will never get APIs for AI agents. School district sites, small business pages, government offices, and the long tail of e-commerce were built for humans, and they will keep working that way for years. So how do agents actually get things done across the web?
Dhruv Batra is co-founder and chief scientist of Yutori, the company building specialized browser and computer-use agents. He previously led embodied AI at Meta's FAIR lab, training robots in simulation and shipping the image question-answering model on Ray-Ban Meta glasses. His bet: the web is a shared roadway, much like roads split between human drivers and self-driving cars, and agents will be built to use it the way people already do.
Pixels in, clicks out. That is the API.
In this conversation:
- Why the long tail of the web won't re-architect itself for agents
- How Yutori's Navigator perceives pixels and writes JavaScript on the fly to shorten task trajectories
- Why Navigator runs 2-3x faster and 4-5x cheaper than Opus 4.7 and GPT-5.5 on browser tasks
- Learning from live websites, and using URL query parameters as privileged verifiers instead of cloning sites
- What the shift from American to Chinese open-weight models means for startups
- How smart glasses and robots share the same perception-action loop
- Why demand for inference compute is pushing models smaller and onto devices
Chapters:
(00:00) Pixels in, clicks out
(01:37) Why most of the web will never get APIs
(08:47) Aggregation, specialization, and human friction
(11:39) Digital niches and specialized models
(16:41) The web's heavy tail and where browser agents win
(20:40) Inside Yutori's Navigator and Scouts
(24:08) N1.5: writing JavaScript to cut trajectory length
(27:45) Training on live websites
(33:29) Open source: FAIR's legacy and the Chinese frontier
(37:22) Agent frameworks: OpenClaw, Hermes, heartbeats
(40:57) How non-technical users adopt agents
(44:25) Smart glasses, robotics, and embodied AI
(50:57) Compute demand and smaller on-device models
(53:12) Why the company is called Yutori
(01:37) Why most of the web will never get APIs
(08:47) Aggregation, specialization, and human friction
(11:39) Digital niches and specialized models
(16:41) The web's heavy tail and where browser agents win
(20:40) Inside Yutori's Navigator and Scouts
(24:08) N1.5: writing JavaScript to cut trajectory length
(27:45) Training on live websites
(33:29) Open source: FAIR's legacy and the Chinese frontier
(37:22) Agent frameworks: OpenClaw, Hermes, heartbeats
(40:57) How non-technical users adopt agents
(44:25) Smart glasses, robotics, and embodied AI
(50:57) Compute demand and smaller on-device models
(53:12) Why the company is called Yutori
Connect with Dhruv Batra:
- LinkedIn: https://www.linkedin.com/in/dhruv-batra-dbatra/
- X/Twitter: https://x.com/DhruvBatra_
- Yutori: https://yutori.com
Connect with Chain of Thought host Conor Bronsdon:
- Newsletter: https://newsletter.chainofthought.show/
- Twitter/X: https://x.com/ConorBronsdon
- LinkedIn: https://www.linkedin.com/in/conorbronsdon/
- YouTube: https://www.youtube.com/@ConorBronsdon
More episodes: https://chainofthought.show
