Google, Gemini API에 Flex·Priority tiers 추가로 비용·신뢰도 분리 제어

Google은 2026년 04월 02일 Gemini API에 Flex와 Priority라는 두 개의 새로운 service tiers를 도입했다. 회사가 겨냥한 문제는 agent 설계에서 자주 나타난다. background 작업은 더 저렴하게 처리하고 싶지만, user-facing request는 peak demand 중에도 끊기지 않는 높은 reliability가 필요하다는 점이다.

이번 변화는 단순한 가격 정책이 아니라 아키텍처 변화에 가깝다. 그동안 많은 팀은 background logic를 standard synchronous serving과 asynchronous Batch API로 나눠야 했다. Google은 Flex와 Priority를 통해 background traffic과 interactive traffic을 모두 standard synchronous endpoints 위에 유지한 채, request마다 service_tier parameter를 설정해 동작을 바꿀 수 있다고 설명한다.

Flex Inference는 cost-optimized option이다. Google은 batch-processing overhead 없이 latency-tolerant workloads를 처리하도록 설계됐으며, Standard API 대비 50% price savings를 제공한다고 밝혔다. 예시로는 background CRM updates, large-scale research simulations, model이 background에서 browses 또는 thinks 하는 agentic workflows가 제시됐다. Flex는 모든 paid tiers에서 사용할 수 있고 GenerateContent 및 Interactions API requests를 지원한다.

Priority Inference는 critical applications를 위한 premium path다. Google은 이 tier가 요청에 highest criticality를 부여해 peak load 중에도 중요한 traffic이 preempt되지 않도록 돕는다고 설명한다. 또한 Priority limits를 초과하면 overflow requests를 실패시키지 않고 Standard tier로 자동 전환한다. Priority는 Tier 2 / 3 paid projects에서 GenerateContent와 Interactions API endpoints에 제공된다.

Flex는 synchronous 개발 경험을 유지하면서 inference cost를 낮춘다.
Priority는 time-sensitive traffic의 assurance를 높이고 graceful downgrade를 제공한다.
두 tier를 함께 두면 request-level economics와 reliability가 application 설계 요소가 된다.

전략적으로 보면 model API가 단순 token 판매를 넘어 agentic application용 traffic-management layer로 진화하고 있다는 신호다. Google은 모델 자체뿐 아니라, 업무 유형에 맞는 runtime behavior까지 상품화하고 있다.

Google, Gemini API에 Flex·Priority tiers 추가로 비용·신뢰도 분리 제어

Related Articles

GLM-5.2를 느린 PC에서 돌리는 Colibri, Local AI의 병목은 GPU만이 아닌 이유

Google, Gemma 4에 MTP 드래프터 출시 — 추론 속도 최대 3배 향상

TurboQuant, KV cache 압축을 시스템 레벨 이슈로 끌어올리다

Related Articles

GLM-5.2를 느린 PC에서 돌리는 Colibri, Local AI의 병목은 GPU만이 아닌 이유
744B MoE 모델을 소비자용 PC에서 돌리겠다는 실험에 관심이 모였다. Colibri는 GLM-5.2의 활성 파라미터와 전문가 라우팅 구조를 이용해, 거대한 모델을 전부 RAM이나 GPU에 올리지 않는 쪽으로 설계를 잡았다.

Google, Gemma 4에 MTP 드래프터 출시 — 추론 속도 최대 3배 향상
LLM Reddit May 6, 2026 1 min read

TurboQuant, KV cache 압축을 시스템 레벨 이슈로 끌어올리다
LLM Hacker News Mar 26, 2026 1 min read