OpenAI Custom AI Chip: Jalapeño Marks a New Era in LLM Inference Silicon

OpenAI has entered the custom silicon race. On 24 June 2026, the company and Broadcom jointly announced Jalapeño — OpenAI's first purpose-built inference chip, designed from scratch as a custom ASIC optimised for large language model workloads. Engineering samples are already running production models, including GPT-5.3 Codex Spark, at target frequency and power. Deployment at gigawatt scale with data centre partners is planned before the end of 2026.

The move places OpenAI alongside Google (TPUs) and Amazon (Trainium) as hyperscalers that control their own inference hardware, reducing long-term dependence on NVIDIA GPUs and reshaping the economics of AI scaling in the efficiency era.

TL;DR

What: Jalapeño is OpenAI's first "Intelligence Processor" — a custom ASIC built with Broadcom exclusively for LLM inference.
Speed of development: Nine months from design to tape-out, reportedly the fastest high-performance ASIC development cycle on record. OpenAI's own AI models assisted the chip design process.
Performance: Early benchmarks show performance per watt substantially better than current state-of-the-art GPU-based inference.
Scale: Gigawatt-scale deployment targeted by end of 2026; Microsoft expected to purchase roughly 40 percent of initial output.
Strategic implication: OpenAI now operates across the full stack — products, models, and custom silicon — reducing NVIDIA dependency.

What Is Jalapeño and Why Does It Matter?

Jalapeño is not a general-purpose accelerator. It is a fixed-function inference ASIC whose architecture was shaped around the specific compute kernels, memory-movement patterns, and serving workloads that matter most for frontier language models. Richard Ho, OpenAI's Hardware Program Lead, described the design philosophy as one that "optimised the architecture around the kernels, memory movement, networking, and serving patterns that matter most for frontier AI models."

In practical terms, this means the chip minimises data movement — the dominant cost in transformer inference — and balances compute, memory bandwidth, and networking resources in a way that general-purpose GPUs cannot. The networking layer uses Broadcom's Tomahawk silicon, and system integration (board, rack, cooling) is handled by Celestica.

For anyone tracking how inference cost drives product decisions, a purpose-built chip delivering meaningfully better performance per watt changes the unit economics of every API call OpenAI serves.

How Did OpenAI Build a Custom AI Chip in Nine Months?

The timeline is striking. Most high-performance ASICs take two to three years from architecture to tape-out. OpenAI claims nine months — enabled in part by using its own AI models to accelerate the design workflow. The company has not published detailed methodology, but the implication is that ML-assisted layout, verification, and optimisation compressed what would ordinarily be sequential human-driven steps.

Broadcom's role as silicon implementation partner is critical here. Broadcom (NASDAQ: AVGO) brings decades of ASIC design-for-manufacturing expertise plus existing relationships with leading foundries. The partnership was formally announced in October 2025, though chip plans had been rumoured since 2023.

This is explicitly framed as the first chip in a multi-generation compute platform. Broadcom CEO Hock Tan confirmed: "This is just the beginning of a multi-generation roadmap."

Who Will Use These Chips?

The initial customer base appears concentrated:

OpenAI's own inference fleet — serving ChatGPT, the API, and products like Codex.
Microsoft — expected to purchase approximately 40 percent of chip output, consistent with its role as OpenAI's primary cloud partner.
Data centre partners deploying at gigawatt scale (names not yet disclosed beyond Celestica for integration).

Sam Altman positioned the chip within a broader infrastructure strategy: making compute "more abundant, resulting in AI which is faster, more reliable, more affordable for people and businesses." If inference cost drops materially, it opens pricing headroom for features that are currently margin-constrained — longer context windows, more agentic loops, real-time applications.

How Does This Compare to Google TPUs and Amazon Trainium?

The competitive landscape for custom AI silicon now has three major entrants among frontier-model companies:

Company	Chip	Primary Use	Generation
Google	TPU v6 (Trillium)	Training + inference	6th gen
Amazon	Trainium 2	Training + inference	2nd gen
OpenAI	Jalapeño	Inference only	1st gen

Jalapeño's scope is narrower — inference only, at least for now. That constraint likely allows tighter optimisation for serving workloads at the cost of flexibility. Google and Amazon have both converged on chips that handle training and inference, giving them more deployment versatility.

The honest limitation: this is a first-generation part. Performance claims are early-stage benchmarks, not production fleet data at scale. Whether the chip delivers its efficiency gains consistently under real-world traffic patterns (variable batch sizes, mixed model sizes, latency SLA pressure) remains to be proven in deployment.

What Does This Mean for NVIDIA?

NVIDIA remains the dominant supplier of AI accelerator hardware globally. OpenAI's custom chip does not eliminate that dependency overnight — Jalapeño is inference-only, and OpenAI still needs NVIDIA (or equivalent) hardware for training frontier models.

However, inference represents the majority of ongoing compute cost for a deployed model. Shifting even a substantial fraction of inference to custom silicon changes the commercial relationship. It gives OpenAI negotiating leverage, supply-chain diversification, and — if performance claims hold — a structural cost advantage over competitors still running inference exclusively on merchant GPUs.

For the broader AI efficiency trajectory, custom inference silicon is a logical next step. As model architectures stabilise, the returns from purpose-built hardware increase relative to general-purpose flexibility.

FAQ

Q: When will Jalapeño chips be available at scale? A: OpenAI targets gigawatt-scale deployment with data centre partners by end of 2026. Engineering samples are already running production workloads.

Q: Does this mean OpenAI no longer needs NVIDIA GPUs? A: No. Jalapeño handles inference only. OpenAI still relies on NVIDIA hardware for training frontier models. The chip reduces GPU dependence for serving, not for research.

Q: Can third parties buy Jalapeño chips independently? A: No public availability has been announced. Current plans indicate deployment within OpenAI's own infrastructure and through partners like Microsoft.

Q: How does OpenAI's AI-assisted chip design work? A: OpenAI used its own models to accelerate the design cycle but has not published detailed methodology. The nine-month timeline suggests ML-assisted verification, layout optimisation, or both.

Q: What models run on Jalapeño? A: Engineering samples are confirmed running GPT-5.3 Codex Spark. The architecture is designed for frontier LLM inference broadly, not a single model.

The Tech Archive uses AI tools in research and production. Read how we work.

TL;DR

What: Jalapeño is OpenAI's first "Intelligence Processor" — a custom ASIC built with Broadcom exclusively for LLM inference.
Speed of development: Nine months from design to tape-out, reportedly the fastest high-performance ASIC development cycle on record. OpenAI's own AI models assisted the chip design process.
Performance: Early benchmarks show performance per watt substantially better than current state-of-the-art GPU-based inference.
Scale: Gigawatt-scale deployment targeted by end of 2026; Microsoft expected to purchase roughly 40 percent of initial output.
Strategic implication: OpenAI now operates across the full stack — products, models, and custom silicon — reducing NVIDIA dependency.

What Is Jalapeño and Why Does It Matter?

For anyone tracking how inference cost drives product decisions, a purpose-built chip delivering meaningfully better performance per watt changes the unit economics of every API call OpenAI serves.

How Did OpenAI Build a Custom AI Chip in Nine Months?

This is explicitly framed as the first chip in a multi-generation compute platform. Broadcom CEO Hock Tan confirmed: "This is just the beginning of a multi-generation roadmap."

Who Will Use These Chips?

The initial customer base appears concentrated:

OpenAI's own inference fleet — serving ChatGPT, the API, and products like Codex.
Microsoft — expected to purchase approximately 40 percent of chip output, consistent with its role as OpenAI's primary cloud partner.
Data centre partners deploying at gigawatt scale (names not yet disclosed beyond Celestica for integration).

How Does This Compare to Google TPUs and Amazon Trainium?

The competitive landscape for custom AI silicon now has three major entrants among frontier-model companies:

Company	Chip	Primary Use	Generation
Google	TPU v6 (Trillium)	Training + inference	6th gen
Amazon	Trainium 2	Training + inference	2nd gen
OpenAI	Jalapeño	Inference only	1st gen

What Does This Mean for NVIDIA?

FAQ

Q: What models run on Jalapeño? A: Engineering samples are confirmed running GPT-5.3 Codex Spark. The architecture is designed for frontier LLM inference broadly, not a single model.

The Tech Archive uses AI tools in research and production. Read how we work.

OpenAI Custom AI Chip: Jalapeño Marks a New Era in LLM Inference Silicon

TL;DR

What Is Jalapeño and Why Does It Matter?

How Did OpenAI Build a Custom AI Chip in Nine Months?

Who Will Use These Chips?

How Does This Compare to Google TPUs and Amazon Trainium?

What Does This Mean for NVIDIA?

FAQ

Get the practical AI brief

Tags

Discussion

OpenAI Custom AI Chip: Jalapeño Marks a New Era in LLM Inference Silicon

TL;DR

What Is Jalapeño and Why Does It Matter?

How Did OpenAI Build a Custom AI Chip in Nine Months?

Who Will Use These Chips?

How Does This Compare to Google TPUs and Amazon Trainium?

What Does This Mean for NVIDIA?

FAQ

Get the practical AI brief

Tags

Discussion