Is Deepseek Training Lying About Chips Used for Training its AI ?

Altimeter Capital analyst and partner locations what Deepseek claims and outcomes into numbers.

$6M Coaching Prices=Plausible IMO

Rapid math: Coaching expenses ∝ (energetic params * tokens). DeepSeek v3 (37B params; 14.8T tokens) vs. Llama3.1 (405B params; 15T tokens)=v3 theoretically must be 9% of Llama3.1’s payment. And the disclosed right figures aligned with this serve-of-the-envelope math, that intention, the amount are directionally plausible.

Screenshot 2025 01 28 at 7.19.44 AM

Deepseek clearly says in a footnote that the aforementioned expenses contain finest the reliable working towards of DeepSeek-V3, as an alternative of the prices linked to prior overview and ablation experiments on architectures, algorithms, or records.

Comparability of working towards expenses between objects trained at a model of times is inherently wrong: Coaching expenses maintain been bettering continuous. Announcing DeepSeek v3 (Jan 2025) is 1/tenth the working towards payment of Llama3.1 (July 2024) is awfully deceptive. Coaching expenses maintain been dropping exponentially because of inclinations in compute and algorithms.

Pre-working towards a mannequin with a total bunch of billions of parameters within the U.S. this day expenses no longer as a lot as $20M (hump demand the engineers who in fact plot LLMs). DeepSeek will be ~50% more payment-efficient than its U.S. peers – which looks completely plausible to me! It’s adore how a smaller-engine Japanese car can originate comparably to a a lot bigger-engine American car, because of engineering breakthroughs adore turbocharging and gentle-weight-weight originate.

Coaching vs. R&D Prices

It’s complicated for the total labs to clarify working towards expenses since loads of experiments (incl. records expenses) blend into working towards runs.
DeepSeek possibly required ~$500M capex (rumored 10K A100s + 2-3K H800s), quiet a long way no longer as a lot as top U.S. labs however loads bigger than $6M.

First Movers vs. Followers

Quit of us in fact don’t maintain any idea in regards to the wide R&D payment incompatibility between “first-in-line medication” vs. “me-too medication”??
First movers inherently face “wasteful” R&D due to the the trial-and-error nature of innovation. Nonetheless when has humanity ever stopped pushing forward because of that? The trouble is constantly rate it.

Huge Inference Effectivity Gains

Inference expenses maintain constantly been coming down, and DeepSeek finest unlocked a step-feature fall in inference expenses—sooner, cheaper, and first rate quality.
That is the moment many startup founders and developers maintain been looking ahead to! Abruptly, endless functions maintain done product-market match from a payment viewpoint!
This must result in important more inference spending, at closing.

Two Issues Freda Believes and Nextbigfuture Consents

1️⃣ Larger and more efficient AI objects=wide tailwind for the AI supercycle.
2️⃣ DeepSeek is a salvage for starting up-provide AI & brings efficiency to your entire ecosystem.

Closed-provide LLMs below this stage of performance are irrelevant now. The linked shakeup came about after Llama3 changed into as soon as launched, and DeepSeek is now cleansing the home. Inaugurate-provide ecosystems, including $Meta’s, will thrive on this momentum.

DeepSeek at this level is more than the corporate itself. It’s a proof of idea: a hyper-efficient, minute mannequin working on payment-efficient infrastructure.

I agree with Freda. Improved AI efficiency is correct for AI and can also salvage AI winning. A success AI with vastly decrease expenses will mean more nvidia chips will be wanted.

AI development will salvage sooner.

Read Extra

Scroll to Top