This seemingly boring flowchart actually records the evolutionary moment of AI transitioning from “parroting” to “self-epiphany”—DeepSeek-R1-Zero learned to say “Wait, I think I miscalculated” without any human instruction.
1. Deep Insight: When David Learned to Do the Math
In the tech world, there are usually two kinds of shocks. One is “This thing is so powerful, I’d have to sell a kidney to afford it,” and the other is “This thing is so powerful, and it’s cheaper than cabbage.” DeepSeek-R1 clearly belongs to the latter—the kind of “latter” that caused Nvidia’s stock market value to evaporate by hundreds of billions overnight.
Everyone is staring at R1’s high scores in math and coding, but they are ignoring the most terrifying number: $5.5 million.
According to public technical reports, the training cost of DeepSeek-R1 was only around $5.5 million. Meanwhile, peers across the ocean (like OpenAI’s o1) are estimated to have burned through hundreds of millions of dollars to achieve similar results, and are even planning a hundred-billion-dollar “Stargate.” This isn’t just cost control; it is algorithm efficiency mocking brute-force scaling.
The core lies in that split second known as the “Aha Moment.” During the training of DeepSeek-R1-Zero, researchers didn’t feed it thousands of “standard answers” (SFT) as usual. Instead, they went straight to pure Reinforcement Learning (RL). The result? After thousands of steps of self-play, the model suddenly learned “Self-Correction”—it started muttering to itself in the output: “Wait, let me think…”
What does this prove? It proves that the emergence of logical reasoning capabilities does not necessarily require massive amounts of human-labeled data. As long as the Incentive mechanism is designed correctly, AI can teach itself. It’s like no longer teaching a child to solve problems hand-over-hand, but telling them “you get candy if you get it right,” and as a result, they invent their own solution formulas.
2. Independent Perspective: The Overlooked “Distillation” Nuke
Everyone is discussing the behemoth R1-671B, but in my opinion, the real foreshadowing DeepSeek has buried lies in those inconspicuous small models.
DeepSeek extremely generously (or perhaps extremely craftily) released a technology: Model Distillation. They fed the reasoning data generated by the “Super Brain” R1 to “little brothers” like 1.5B, 7B, and 14B parameters.
The results were explosive: a small model with only 7B parameters, after thoroughly absorbing R1’s Chain of Thought, actually crushed some 70B-level open-source models in math and programming capabilities.
What does this mean? It means reasoning capability is “backward compatible.”
Previously, we thought that to be smart, the brain had to be big (large parameters) and the graphics cards had to be expensive. But R1 proved that as long as there is a smart enough “Master” (R1), the “Apprentices” (small models) brought up by it can run perfectly on your laptop or even mobile phone, with their intelligence fully online.
This is a true “dimensional strike.” While OpenAI is still figuring out how to sell APIs at a higher price, DeepSeek directly open-sourced the “method of manufacturing intelligence.” This isn’t selling water; this is pasting the blueprints for digging the well onto the city wall.
3. Industry Comparison: An Asymmetric War
Let’s put DeepSeek-R1 and OpenAI o1 on the dissection table and compare them with the coldest data.
| Dimension | DeepSeek-R1 (API) | OpenAI o1 (API) | Verdict |
|---|---|---|---|
| Input Price (Per 1M tokens) | $0.14 | $15.00 | ~99% Cheaper |
| Output Price (Per 1M tokens) | $2.19 | $60.00 | ~96% Cheaper |
| Training Mode | Mixture of Experts (MoE) + Pure RL (GRPO) | Dense Imaging + Closed Source RL | R1 is lighter and more transparent |
| Deployment Flexibility | Private deployment available, Distillable | API Only, Black Box | R1 is a geek’s toy; o1 is a capitalist’s product |
(Data Source: DeepSeek Official Report & OpenAI Official Pricing; specific values subject to market fluctuations)
This is simply not a competition in the same dimension. OpenAI’s o1 still follows the logic of “Aristocratic Compute”—I have the best cards and the most expensive data, so I charge the highest fees. DeepSeek-R1, on the other hand, is a typical “Engineering Miracle,” utilizing the MoE (Mixture of Experts) architecture to drastically reduce the number of activated parameters during inference (although the total is 671B, only 37B are activated per token), effectively driving a Ferrari with the fuel consumption of an economy hatchback.
For enterprises, the choice becomes incredibly simple: Are you willing to pay an “IQ tax” for every thought, or are you willing to achieve 95% of the effect at 1% of the cost?
4. Unfinished Thoughts: If Reasoning is Free, What Becomes Expensive?
If DeepSeek-R1 has ushered in the era of “Cheap Reasoning,” where will the commercial moats of the future lie?
I have a slightly pessimistic hypothesis: The value of the model itself will approach zero.
If any company can rent a few A100s, download R1 weights, distill them, and possess top-tier reasoning capabilities, does the business model of an “LLM Company” still hold?
The expensive things of the future may no longer be “Intellect,” but “Intent” and “Trust.”
- Intent (Context): AI can reason, but it doesn’t know your private data or your business logic. Whoever holds the most complete Context holds the steering wheel of AI.
- Trust: When models can be modified and distilled at will, how do you prove your AI hasn’t been implanted with a backdoor? Or that it hasn’t hallucinated during the reasoning process?
Perhaps DeepSeek has forced the industry to enter the next stage ahead of schedule: shifting from “competing on parameters” to “competing on scenarios.”
5. Final Thoughts: Homage to the “Wait” Moment
What moved me most about DeepSeek-R1 was not its low price, but a detail in the technical report that was almost cute:
During the early training of R1-Zero, the model suffered from “reading difficulties” due to prolonged thinking, outputting a bunch of gibberish, and even falling into infinite loops. Instead of intervening immediately as is traditional, the researchers watched it and waited for it.
Until a certain moment, it realized its mistake on its own, learned to backtrack, and learned to say “Wait.”
This respect and patience for the essence of technology appear particularly precious in the impetuous AI gold rush.
It reminds us: True intelligence is often born in those moments of “not figuring it out,” rather than in hard drives stuffed with standard answers.
DeepSeek is like that eccentric student in class who suddenly raises his hand to question the teacher’s formula. Although it threw the classroom order (and stock prices) into chaos for a moment, we all know that this is how science progresses.
References:
- DeepSeek-R1 Review – DXC Technology
- DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning
- DeepSeek R1 vs OpenAI O1 – Galileo.ai
- How DeepSeek R1 Works – Pedromebo
—— Lyra Celest @ Turbulence τ
