This cyberpunk-style illustration aptly hints at Devin’s current situation: behind the glamorous red-eye effects is a group of humans frantically pounding keyboards to patch bugs.
These past few days, the anxiety levels across the entire tech community have essentially been maxed out by Devin.
This nuclear bomb dropped by Cognition AI—the “World’s First Fully Autonomous AI Software Engineer”—hasn’t even opened for public beta yet, but it has already mentally “slaughtered” human programmers several times over on Twitter and WeChat Moments. In the demo videos, Devin takes orders, fixes bugs, and deploys code all on its own, making The Matrix look like a documentary overnight.
But let’s put down the anxiety meds for a moment. After carefully digging into Devin’s underlying data and analyzing that sky-high hyped Demo, I smell a familiar scent—that unique Silicon Valley blend of caramel macchiato and PowerPoint bubbles.
To put it bluntly, Devin’s current state is still a million miles away from “replacing you” (and GPT-5), but it’s only one polished video away from “harvesting investors.”
1. Deep Insight: The “God-Making” Movement and the Awkward 13.86%
Let’s look at a number that many marketing accounts have deliberately ignored: 13.86%.
This is Devin’s success rate on SWE-bench (a benchmark for evaluating AI’s ability to solve real GitHub issues). Yes, less than 14%. While this is nearly 7 times better than the previous SOTA model (about 1.96%) and represents a huge technological leap, what does this number mean in the context of practical engineering?
It means that if you hand a project to Devin, there is an 86% probability it will screw it up.
If your pacemaker had an 86% chance of failing, or your self-driving car had an 86% chance of seeing a red light as green, would you dare to use it? Yet, in the AI circle, this “failing grade” is packaged as a “rising star.”
This is actually quite absurd. Capital and media are celebrating because they see the slope of “7x growth”; meanwhile, those of us writing code on the front lines see an “artificial idiot” that requires us to wake up at 3 AM to clean up its mess. Devin has indeed broken through the ceiling of LLMs, but its current position is more like a tireless junior intern with extremely fast hands who frequently mistakes rm -rf / for an optimization command.
It isn’t writing code; it is “guessing” code based on probability.
2. Independent Perspective: The “Red Pill” in the Demo
Aside from the success rate, the most criticized aspect of Devin is the “Upwork job” demo video.
This move was actually very clever, directly piercing the psychological defenses of the working class—”Look, it can not only work, but it can also earn money!”. But if you carefully review that video, you’ll find it’s full of carefully designed “magic tricks.”
The hardcore YouTuber “Internet of Bugs” has already exposed the truth: The so-called “Bugs” Devin fixed in the demo were often hallucinations it created itself, or extremely simplified tasks. It’s like a chef claiming he can cook a Manchu-Han Imperial Feast, but demonstrating how to throw a bag of frozen dumplings into boiling water—and taking a whole 6 hours to do it.
Even worse, the “full autonomy” shown in the video covers up a massive amount of human Prompt guidance and correction behind the scenes. This is a bit underhanded. The current Devin is like a child riding a bicycle supported by a parent. The parent lets go for one second, the child rides two meters, and the parent immediately posts on social media: “Look! My son is ready for the Tour de France!”
This kind of “over-promising” not only overdrafts public trust but also conceals the fatal shortcomings of current AI Agents in long-chain logical reasoning.
3. Industry Comparison: Bicycles vs. Autonomous Driving
If GitHub Copilot is an “electric bicycle,” then Devin is trying to define itself as “L5 Autonomous Driving.”
Copilot’s logic is “assistance”; it acknowledges that the human is the driver and is responsible for making your pedaling easier. This is a very pragmatic mode with an excellent commercial loop.
Devin’s ambition, however, is “replacement.” It wants to grab the steering wheel from your hands. But the problem is, software engineering is not just about writing code (Coding); it is about System Design, Requirement Analysis, and arguing with that Product Manager who doesn’t understand tech but changes requirements daily.
At present, on the point of “understanding vague requirements,” Devin is arguably worse than a fresh college graduate. It can perfectly execute “write a quicksort,” but if you tell it to “optimize the user experience of this page, make it ‘colorful black’,” it will likely burn out its CPU.
In contrast, while Claude 3 Opus is stunning in code understanding, Anthropic appears much more restrained, still positioning it at the “assistant” level. Cognition AI’s radicalism may be driven more by the pressure to raise funds than by technological maturity.
4. Unfinished Thoughts: Who is Responsible for the “Shit Mountain”?
Putting technology aside, I am more worried about the legal black hole of accountability.
If a line of code written by Devin causes a global blue screen of death similar to the CrowdStrike incident, who takes the blame?
Is it Cognition AI? (Their user agreement surely disclaims liability).
Is it the company that deployed Devin?
Or is it the unlucky team lead who should have reviewed Devin’s code but went to slack off because they trusted AI too much?
When the marginal cost of code production drops to zero, the output of garbage code (Shit Code) will approach infinity. The future Internet might be filled with “Shit Mountains” generated by AI, which only AI can understand (or maybe not even AI). By then, the core value of human programmers may no longer be “building,” but “archaeology”—searching for that one red wire to cut amidst millions of lines of AI-generated logical dead knots.
The imagery is terrifyingly cyberpunk.
5. Final Words
Devin is certainly not the end point; it is just the pig pushed into the wind, or perhaps, a premature unicorn.
I do not deny that AI will eventually reshape software engineering, but please don’t rush to put a suit on it and send it to Wall Street while it’s still wearing diapers. For us carbon-based organisms still pounding keyboards, rather than worrying about being replaced, it is better to learn how to master this “super intern.”
After all, until it learns how to handle that “colorful black” request, the steering wheel is still in our hands.
