Skip to main content
AI Briefing

AI stack splits in three: $965B at the top, new competition in inference

Anthropic raises $65B at $965B, Groq seeks $650M, Apple pays Google $1B/year for Gemini — the AI supply chain splits into layers with opposite dynamics.

LV EN

Summary

  • The frontier will narrow to 3-4 vendors — Anthropic raised $65B at a $965B post-money valuation; Apple signed a $1B/year deal with Google for Siri-powering Gemini.
  • Inference is fragmenting and getting cheaper — Groq is raising $650M after Nvidia’s $20B partial deal; competition for compute is intensifying.
  • Even Apple gave up on building it themselves — the world’s richest tech company is licensing a 1.2T-parameter Gemini because its own in-house models (max 150B parameters) cannot keep up.

Bottom line: the AI supply chain is splitting into layers with opposite dynamics — concentration at the top, cheaper competition in the middle. Your job as a leader is to know which layer you’re buying from and which you’re not.


1. Anthropic at $965B — frontier capital concentrates at the top

What changed. Anthropic raised $65B in Series H at a $965B post-money valuation. Lead investors: Altimeter, Dragoneer, Greenoaks, Sequoia. Annualized run rate now $47B — up from $9B in December. Alongside the round, the company released Claude Opus 4.8 with a 1M-token context window, 69.2% on SWE-Bench Pro (10 points above GPT-5.5), and a new “Dynamic Workflows” (ultracode) capability — a single task spawns hundreds of parallel subagents (demonstrated migrating a 750,000-line codebase in 6 days).

Why it matters. Two years ago there were a dozen serious frontier-model contenders. Today there are 3-4 — Anthropic, OpenAI, Google, maybe Mistral. The capital wall (the cash you need to stay at the frontier) is now beyond anyone who doesn’t have $60B+ in the bank. That means your choice of vendor over the next two years narrows sharply, and switching costs rise.

What to do this month.

  • List which AI models your workflows currently depend on (chat, embeddings, code, voice). How many vendors? How much of your data lives behind one?
  • Add integration for at least one alternative frontier model — even if you don’t use it today. Multi-vendor clauses in contracts will be standard next year.
  • Look at Anthropic’s Dynamic Workflows as a stress test — if what your team does in a week can be done by an agent in 6 days with proper orchestration, which of your workflows are worth automating first?

What to expect.

  • Next 30-90 days: OpenAI and Google respond with their own price or context-window improvements.
  • 6-month window: frontier API prices on input tokens drop 30-50% at the top tier.
  • By year-end: the first European enterprises start requiring “multi-vendor clauses” from SaaS vendors that white-label only one model.

2. Groq’s $650M and Nvidia’s $20B — the inference layer fragments

What changed. Groq, the AI inference-chip startup, is raising $650M from existing investors — Disruptive and Infinitium have agreed to cover any shortfall from other investors. This follows a December deal in which Nvidia paid ~$20B to bring senior Groq employees aboard and license Groq’s hardware technology, without a full acquisition. Groq is currently led by interim CEO Adam Winter and CFO Matt Eng. The key pivot: from chip manufacturing to inference-cloud services.

Why it matters. Inference — the compute that runs when your user hits “submit” and the AI responds — is now a larger market than model training. The fact that Nvidia paid $20B to absorb Groq’s talent, and that Groq’s investors are putting in another $650M after that, signals that inference prices are being pushed down hard. For a mid-size business, this is a clear message: do not sign 3-year fixed-price AI compute contracts right now; in 6-12 months, the price you pay per million output tokens may drop 40-60%.

What to do this month.

  • Review every AI cloud contract coming up for renewal this quarter and next. Defer long-term commitments by 3-6 months where you can.
  • Calculate your monthly AI inference bill (including every SaaS tool that uses OpenAI/Anthropic “under the hood”). Do you know the number? If not — that’s your first problem.
  • Identify one critical workflow where you are overpaying for AI right now, and find an alternative (different model pricing, on-prem model, or simply shorter context).

What to expect.

  • 90 days: another one or two inference startups (Cerebras, SambaNova) announce major rounds.
  • 6 months: the first public price war between Anthropic, OpenAI and Google on input tokens below $1/million.
  • Year-end: “consolidated AI bill” emerges as a line item in larger company finance reports.

3. Apple pays Google $1B/year for Gemini — even the giants stop building in-house

What changed. Apple and Google jointly announced on 12 January that Apple will gain access to a custom 1.2-trillion-parameter Gemini model, built specifically for Siri and Apple Intelligence. The deal is valued at roughly $1B/year (Bloomberg / Mark Gurman). Apple distills (compresses) the large Gemini into smaller models that run on-device on iPhone with no network connection. iOS 26.4 delivers the first Gemini-powered Siri features to 1.5 billion daily users. A full Siri redesign is expected with iOS 27 (WWDC 8 June, public release September).

Why it matters. Apple is the most resource-rich technology company in the world, with $200B+ in cash. For three years Apple built its own AI models — peaking at 150B parameters. The message from this deal: even with all that cash and talent, Apple admitted it could not catch up to the frontier in time. For mid-size businesses still spending budget on “our own AI team training our own model” — that’s wasted cost and time. Your competitive opening is not in the model — it’s in how you integrate existing models with your data and workflows.

What to do this month.

  • Look at how your AI budget splits: what percentage goes to “the model” (training, fine-tuning, our own LLM) vs. “integration” (your data, your workflows, your interface)? A healthy split today is 10/90 or 5/95 in favour of integration.
  • Identify 1-2 workflows where calling a frontier model directly (via API) is cheaper and faster than buying a SaaS tool that does the same thing.
  • If your team is working on “our own LLM fine-tune” — re-evaluate whether it’s still worth it compared to a frontier model used with your data in the context window.

What to expect.

  • 30 days: more detail from WWDC 8 June on how Apple integrates Gemini in iOS 27.
  • 90 days: the first regulated enterprises (banking, healthcare) start publicly debating why they cannot use Apple-Google Siri for corporate use.
  • Year-end: at least one major non-US tech company (Samsung, or an EU player) announces a similar licensing deal with Anthropic or Mistral.

Today’s picture

Three pieces of news — Anthropic $965B, Groq $650M, Apple-Google $1B/year — tell the same story: the AI supply chain is splitting into layers with opposite dynamics. At the top — frontier models — capital and talent concentrate into 3-4 vendors. In the middle — inference — competition rises and prices fall. At the bottom — applications and integration — open ground, where most of your economic value will be created. Even Apple’s build strategy has retreated from the middle layers to focus only on application. That’s a hint for where you, with a smaller budget, should be focusing too.

EventConsequence
Anthropic at $965B valuation, $47B run rateFrontier vendor count narrows to 3-4; multi-vendor clauses become critical
Groq $650M after Nvidia $20B dealInference prices fall over next 6-12 months; don’t lock long-term now
Apple licenses Gemini at $1B/yearEven giants stop building in-house; redirect your budget from “our own model” to “integration”

Three questions for the leader:

  • Do you know your company’s total AI spend (every SaaS tool that uses OpenAI/Anthropic/Google under the hood)?
  • If Anthropic or OpenAI doubled their price tomorrow, which workflow would stop, and in how many days could you switch to an alternative?
  • Does your 2026 budget include a “build our own model” or “fine-tune our own LLM” line? If yes — would you like to discuss why Apple just gave up on doing the same?

Sources