OpenAI’s first Jalapeno chip targets LLM inference after 9-month tape-out

Inference becomes a full-stack problem

The next cost battle in AI is not only about larger models. It is about serving ChatGPT, Codex, API calls, and agent workflows with lower latency and better efficiency. OpenAI used a June 24, 2026 post on X to disclose Jalapeno, its first AI chip, built with Broadcom for LLM inference rather than adapted from a general AI accelerator. The source tweet is available here.

We’ve designed and built our first AI chip: Jalapeño. Designed from the ground up by OpenAI and brought to production with @Broadcom, Jalapeño is purpose-built for the LLM workloads powering ChatGPT, Codex, the API, and future agentic products.

The linked OpenAI post adds the technical frame. Jalapeno is described as OpenAI’s first Intelligence Processor, designed around the company’s model roadmap, kernels, serving systems, and product needs. Broadcom handles silicon implementation and networking, while Celestica contributes board, rack, and system integration. OpenAI says engineering samples are already running ML workloads in the lab at production target frequency and power, including GPT-5.3-Codex-Spark.

The concrete number is the development cycle: OpenAI says Jalapeno moved from design to manufacturing tape-out in nine months, helped by OpenAI models used in parts of chip design and optimization. The company also says early tests point to substantially better performance per watt than current state-of-the-art systems, though it has not yet published final measurements. A more detailed performance report is expected in the coming months.

OpenAI’s account is the company’s primary channel for model, product, and infrastructure updates, so this post is a strategic signal as much as a hardware note. If Jalapeno works as described, the payoff would show up in cheaper API serving, faster interactive products, and longer Codex or agent runs with less waiting. The next thing to watch is whether the promised technical report discloses latency, utilization, and performance-per-watt comparisons in enough detail to judge the chip against NVIDIA, Google TPU, and other custom inference platforms.

OpenAI’s first Jalapeno chip targets LLM inference after 9-month tape-out

Inference becomes a full-stack problem

Related Articles

Local Qwen is not a worse Opus; it is a different operating model

OpenAI tests alignment training that survives adversarial pressure

GPT-5.5 Instant puts frontier-level health answers in free ChatGPT