OpenAI’s first Jalapeno chip targets LLM inference after 9-month tape-out
Original: OpenAI builds Jalapeno, its first LLM inference chip with Broadcom View original →
Inference becomes a full-stack problem
The next cost battle in AI is not only about larger models. It is about serving ChatGPT, Codex, API calls, and agent workflows with lower latency and better efficiency. OpenAI used a June 24, 2026 post on X to disclose Jalapeno, its first AI chip, built with Broadcom for LLM inference rather than adapted from a general AI accelerator. The source tweet is available here.
We’ve designed and built our first AI chip: Jalapeño. Designed from the ground up by OpenAI and brought to production with @Broadcom, Jalapeño is purpose-built for the LLM workloads powering ChatGPT, Codex, the API, and future agentic products.
The linked OpenAI post adds the technical frame. Jalapeno is described as OpenAI’s first Intelligence Processor, designed around the company’s model roadmap, kernels, serving systems, and product needs. Broadcom handles silicon implementation and networking, while Celestica contributes board, rack, and system integration. OpenAI says engineering samples are already running ML workloads in the lab at production target frequency and power, including GPT-5.3-Codex-Spark.
The concrete number is the development cycle: OpenAI says Jalapeno moved from design to manufacturing tape-out in nine months, helped by OpenAI models used in parts of chip design and optimization. The company also says early tests point to substantially better performance per watt than current state-of-the-art systems, though it has not yet published final measurements. A more detailed performance report is expected in the coming months.
OpenAI’s account is the company’s primary channel for model, product, and infrastructure updates, so this post is a strategic signal as much as a hardware note. If Jalapeno works as described, the payoff would show up in cheaper API serving, faster interactive products, and longer Codex or agent runs with less waiting. The next thing to watch is whether the promised technical report discloses latency, utilization, and performance-per-watt comparisons in enough detail to judge the chip against NVIDIA, Google TPU, and other custom inference platforms.
Related Articles
Alex Ellis’s post resonated because it framed local LLMs through business use, control, cost, and agent reliability instead of a simple benchmark ladder.
OpenAI’s new alignment work targets durability, not just benchmark behavior. The study trains beneficial traits across 12 domains and tests whether they persist under adversarial prompts and harmful fine-tuning.
OpenAI is pushing stronger medical reasoning into the free ChatGPT tier: GPT-5.5 Instant now matches frontier Thinking models on its health evaluations, while 230 million users ask health and wellness questions each week.