If a team of human engineers built a web browser that only half-worked, it wouldn’t get people talking. But when Michael Truell, CEO of coding startup Cursor, posted on X last week that a swarm of AI agents had built a browser that, he wrote, “kind of works”—while running uninterrupted for a week without any human intervention—it went viral across the tech world, with over six million views.
Why the buzz? Two big reasons: For one thing, AI’s attention span has historically been short. In the early days of ChatGPT, models could stay on task for only a few seconds. That horizon stretched to minutes for better models, then to hours. The Cursor project claims to be one of the first times an AI system has sustained a complex, open-ended software project for an entire week without human guidance.
In addition, single AI agents are limited to focused, small tasks. But getting hundreds of agents to coordinate on a big project has still seemed futuristic. That’s why Cursor wanted to see how far they could push autonomous coding–on a project that could take months for a human team–by having an “orchestra” of AI agents working as a team. Could an AI system be persistent enough, and work together well enough, to explore code, break work into parts, debug itself and keep moving forward for days without drifting away from the task at hand?
An AI agent ‘orchestra’
The researchers found that the answer was mostly yes. Cursor’s experiment orchestrated hundreds of agents into something like a software team. It had “planners,” “workers,” and “judges” coordinating across millions of lines of code. This hints at what both Cursor and OpenAI say is a near future in which AI doesn’t just assist employees, but takes on entire projects. That would fundamentally reshape how complex work gets done–first in software development, but then in other professions.
There have been AI swarm experiments for a couple of years now. But today, Cursor says models are smarter and can stay coherent for much longer. The models can be run at a far larger scale, with a custom layer that orchestrates hundreds of agents and keeps them from descending into chaos.
Jonas Nelle, an engineer at Cursor working on long-running AI agents, told Fortune that as AI models keep getting better, engineers and researchers need to revisit their assumptions every few months about what the AI models can do. While he admitted he “wouldn’t download it and delete Chrome today,” the browser project was “certainly better than anything models previously would have been able to do.”
These long-running agents are an important frontier, added Bill Chen, an OpenAI engineer that stress-tests and evaluates the real-world behavior of the company’s models. The length of a task, and the fact that an AI system can accomplish the task autonomously and coherently is a “very good indicator of how intelligent and how general a system is,” he said. The Cursor project, which was powered by OpenAI’s GPT-5.2, is “a direct result of us really continuously pushing forward the boundaries of model capabilities.” In the future, he said, there will be even longer horizon tests.
AI agent swarms are not ready for business use
Still, these are not production-ready systems. Besides being buggy and incomplete, a project running swarms of agents for days or weeks is expensive. While prices have fallen steeply over the past year, long-running jobs with hundreds of AI agents can still rack up costs.
There are also security issues. An autonomous system raises worries about vulnerabilities, data leaks, and much more, and requires many new layers of control and auditability.
But Chen said he foresees a near future where something like this could be ready “for broad consumption and at a not prohibitive cost. Progress has been continuous so far, he explained, and there have been important unlocks every step of the way. For now, he said, the excitement is driven by the fact that this is a real, practical example of model capability, “versus how this model performs on academic and public evaluations and benchmarks.”
The shift has surprised even longtime AI observers. In a recent post, independent researcher Simon Willison predicted that by 2029, someone would build a full web browser largely using AI—and that it wouldn’t even be surprising. “Rolling a new web browser is one of the most complicated software projects I can imagine,” he wrote. Cursor may have accelerated that timeline. “I may have been off by three years,” Willison said. “I have to admit I’m very surprised to see something this capable emerge so quickly.”
This speaks to what OpenAI and others have talked about as a “capabilities overhang” – the idea that the most sophisticated AI models can do much more than what’s publicly deployed, but the right combination of tools, product design and drops in cost can suddenly make them usable at scale. So while tools like the Cursor browser aren’t quite ready for prime-time, the trajectory is clear.