Artificial Intelligence - DeepSeek radicalism

by Pininvest Analysis
Artificial Intelligence - DeepSeek radicalism
Micke Lindström - New ideas sprouting in unexpected places / Unsplash

Once upon a time, last September '24, we had the latest on OpenAI with its newest new release GPT4o

And we had another batch of numerous billion dollars being shoveled into a monstrous furnace, keeping the Artificial Intelligence (AI) competitors committed – although the actual need of such outrageous cash amounts is not entirely clear to simple minds…

But now, there is DeepSeek to think about

Meeting with rather deafening silence by the big players of Artificial Intelligence, in the hope to sweep the newcomer under the carpet, DeepSeek may in fact be the wake-up call from many dreams of riches, of power and of influence

Simply stated, DeepSeek may be a watershed moment in the upward march of AI

DeepSeek has been cheap to develop ($6 million compared to the hundreds of millions costed by OpenAI or Google), is truly and fully open source, is extraordinarily cheap to use (at 1/30th of GPT4) and, three months after launch, ranks above Meta’s Llama and Google’s Gemini

And DeepSeek is the research product of a Chinese quantitative investment fund, called High Flyer, with $8 billion in assets under management


By themselves mindboggling, these facts may frame a reality very different from the anticipations the AI giants entrusted to the public, to their own investors and (presumably) to the U.S.  government

 

Power 

Paramount has been the power to be derived from control over the AI revolution, putting the U.S. in confrontation with China for technological dominance – and power was expected to benefit solely the U.S. technological behemoths, with the financial muscle to secure their dominance across the Western world

Taking direct aim at Chinese efforts to keep up, the American government undercut the Chinese in two ways

  • By making difficult the access to the most advanced GPUs – viewed as indispensable in the massive data centers running AI research and applications – and by blocking the delivery of the sophisticated semiconductors machinery of the Dutch ASML and their Japanese competitors
  • By imposing strict licensing requirements on the data that underpin frontier LLMs in the “Framework for Artificial Intelligence Diffusion”, released for a four-month review period by the Biden Administration

The American powerplay aims to cement the dominance of American firms – Oracle , Microsoft, OpenAI and Musk’s xAI – competing, partnering and taking minority interests in Western promising start-ups, such as French Mistral 

And the ‘Who is who’ of the tech investment community has been lining up, from Andreessen Horowitz, Samsung, IBM, NVIDIA, Microsoft and Salesforce to Yuri Miller’s DST Global, one of the leading Internet investment firms

 

Control

Otherworldly capital expenditures, running into hundreds of billions of dollars, with firms such as Microsoft planning to invest $80-110 bn on datacenter infrastructure this year alone, met by equally staggering commitments of its direct competitors, Google and Amazon, aim to consolidate their market dominance

Control – on a global scale – should be within reach as those 3 firms claim two-thirds of data center capacity worldwide (outside of China)

And this goal appears to be foundational to the AI business proposition, considering the fact that, in 2014 OpenAI lost $5 bn on revenue of $3.7 bn and computing costs alone reached $6 bn – and losses are expected to balloon to $44 bn over the next 3 years

One could argue that control is the short-to-medium term business plan, all by itself

And there is no obvious end to losses since each query to AI models costs money in compute resources, new models again are costly in training and levels of data center infrastructure investments are other worldly

The expectation must be that, beyond the medium term, such as OpenAI’s 2028 horizon, AI will make itself indispensable, at any price

If…

 

Valuation

OpenAI (poorly named since it is not) is currently valued at $157 billon, and 13.x forward revenue, aligned with Facebook’s IPO

But considering the losses, both current and in the near future, all is not well

One may wonder how projected profits will actually be ring-fenced in the fast-moving AI tech world; AI models themselves, the backbone of this new tech industry, prove to be both very costly (in training on massive data) and very competitive with new releases announced every day – driving revenue down from day one

Riches, dreamed by AI start-ups and reaped by $3 trillion GPU titan Nvidia, have, as often the case, benefitted the semiconductor ‘picks and shovels’ of this new gold rush, AI gold diggers not so much, or not yet…

Data center infrastructures, controlled by a very small number of global American hyperscalers, will undoubtedly continue to benefit from rising AI demand, and the firms are sizably but only indirectly committed to AI investments

In a virtuous circle, Amazon , Microsoft and Google will cement their leading positions as cloud providers – with the broadest data center footprint and optimized cross-connectivity to deliver access, speed and efficiency

Accounting for 60% of all hyperscale data center capacity, these three firms have the financial capacity and the drive to keep up with demand - in the name of securing their position at the head of the table

 

DeepSeek – the next frontier

Meta’s Zuckerberg argued for open – or at least more open – source access to AI models back in July 2014, offering complete control over their data and freedom from vendor lock-in

The open-weight nature of LLaMA 3 implies the ability to download and apply the model in various scenarios, without providing full insight into or control over the processes that produced the model

Conceptual shifts are always unique and unanticipated - they will be radical and for all to see

The release of DeepSeek's V3 AI model in late December 2024 is an event of such magnitude

Deemed by third-party benchmarks to be competitive with the top league of OpenAI, Google, Meta and Anthropic, the model supports a much lighter load of active parameters (37bn activated for any given token generated, compared to Llama3.1 405bn), requiring less advanced chips and less energy 

Subjected to U.S. semiconductor export regulations,  DeepSeek apparently was able to take a few thousand constrained “Hopper” H800 GPU Nvidia accelerators and create an MoE foundation model that can challenge what OpenAI, Google, and Anthropic can do with their largest models trained on tens of thousands of top-of-the-range GPU accelerators

If this feat is confirmed, and models can be trained on a 'skinny' one-tenth of the accelerators thought necessary, the AI hardware, foremost Nvidia  and Broadcom will need to reassess growth projections

Presenting itself as a non-commercial undertaking, financed by a Chinese quant hedge fund, DeepSeek is much cheaper and more transparent than any existing model, releasing a full open-source model (with publication of the architectural details in late December '24) to great acclaim

The strength of open-source, as demonstrated yet again, resides in the creation of an eco-system, where developers will contribute to further improvements, potentially broadening the target markets of users exponentially

 

Distilling a novel approach

It is its R1 release of DeepSeek which upends the AI approach investors had been supporting with exuberance, coincidentally announced last week, just as the American $500 bn Stargate, partnering OpenAI, Oracle and Japanese Softbank , was formally endorsed by the U.S. government

In essence, Stargate, with mountains of cash, assumes business as usual : same leading firm (OpenAI), same priority to vastly expensive computing scale, same data centric hardware and ultimately same confidence in keeping the upper hand

None of this turns out to be cast in stone...

Knowledge distillation will have none of it - managing to transfer the knowledge from the larger 671bn DeepSeek base model to a smaller one, down to 1.5 billion parameters, without significant loss of validity 

With the ability to run AI models on the cheap, off-line or in the cloud, on any supporting hardware, consequences of the technological breakthrough will be profound

Following the lead of R1 distillation, putting the model within reach of a smartphone running off-line…without significantly impacting performance, new AI models will – and in fact already are – charging ahead

Not be to be left out, Meta is said to be working on DeepSeek’s code and new AI competition seems ready – Doubao 1.5 issued by TikTok parent ByteDance, matching OpenAI’s GPT4o at 1/50th of the price (DeepSeek being just 1/30th …) and still another Hong Kong based research labs is jumping in the fray

 

A shake-up of the AI supply chain - from costly GPU semiconductors of Nvidia to electricity utilies such as GE Vernova   or Constellation Energy  - could, and probably will, boost the outlook of articial intelligence in the medium term, in ways as yet unconceivable

Nvidia's chips may just be a little less indispensable than foretold and electricity demand forecasts by data centers just a little more down to earth...

The AI supply chain will not vanish, but adjust in short order, bringing forward new actors in specialized software applications, embedded in specific industries and making the most of the commoditized AI models

Reading the tea leaves, built on strong and competitive data center foundations, supported by the dominant tech giants, such eco-systems are the premise of an AI users revolution, which general purpose AI producers - such as OpenAI - would initiate but could not make true