Carbon-Aware Inference Routing for Large Language Models: A Real-Time Framework for Sustainable AI Serving

大規模言語モデルのためのカーボンアウェア推論ルーティング：持続可能なAIサービスのためのリアルタイムフレームワーク (AI 翻訳)

Raghuveeran, Preethi

Zenodoプレプリント2026-05-01#AI×ESGOrigin: Global

DOI: 10.5281/zenodo.19934621

原典: https://zenodo.org/records/19934621

🤖 gxceed AI 要約

日本語

本論文は、大規模言語モデル（LLM）の推論における炭素排出を削減するためのリアルタイムフレームワークCAIRを提案する。各推論リクエストを精度とレイテンシ要件を満たす最小のモデルにルーティングし、炭素強度と予算制約を考慮する。1日100万プロンプトのワークロードで約62%の炭素削減が見込まれ、CSRDやEU AI Actへの準拠も想定している。

English

This paper introduces CAIR, a real-time carbon-aware inference routing framework for LLMs. It routes each request to the smallest model meeting accuracy and latency constraints, using carbon intensity and a budget. Preliminary analysis shows ~62% carbon reduction on 1M prompts/day. The framework integrates with existing serving infrastructure and supports CSRD and EU AI Act compliance.

Unofficial AI-generated summary based on the public title and abstract. Not an official translation.

📝 gxceed 編集解説 — Why this matters

日本のGX文脈において

日本でもAIの普及に伴い、LLM運用のカーボンフットプリント管理が重要になっている。本フレームワークは、GX実現に向けたAIインフラの効率化に寄与し、将来的にはSSBJや有報での情報開示にも活用可能。

In the global GX context

This paper addresses the growing need for carbon management in AI workloads globally. The CAIR framework offers a practical, implementation-ready solution for reducing emissions from LLM inference, with built-in compliance hooks for CSRD and EU AI Act. It is highly relevant for organizations deploying large-scale AI and aiming to meet climate disclosure requirements.

👥 読者別の含意

🔬研究者:A novel approach combining carbon-aware routing with per-request complexity scoring and budget enforcement, opening avenues for further optimization in sustainable AI.

🏢実務担当者:Directly applicable to existing LLM serving stacks (vLLM, Ollama) for immediate carbon reduction without sacrificing accuracy or latency.

🏛政策担当者:Demonstrates how technical frameworks can operationalize carbon constraints, providing a model for AI-specific GHG regulations.

📄 Abstract（原文）

This paper introduces CAIR, a real-time carbon-aware inference routing framework for large language models (LLMs). CAIR routes each inference request to the smallest model capable of satisfying its accuracy floor and latency SLA, using three concurrent signals: per-prompt complexity score, live grid carbon intensity, and a time-bounded carbon budget. The carbon budget layer tracks cumulative emissions against a configurable period cap (daily or monthly) and progressively constrains the available model tier as the budget depletes enabling carbon governance at the inference layer rather than optimising only per request. Per-request signals optimise within whatever tier the budget permits. Preliminary analysis on a 1M prompt/day workload suggests ~62% reduction in inference carbon by routing approximately 65% of requests to a 7B-parameter model. The framework is serving-layer-agnostic and integrates with existing LLM deployment infrastructure (vLLM, Ollama). The audit logger is designed for direct use in CSRD ESRS E1 and EU AI Act Art.53 compliance reporting. Phase 1 empirical evaluation on 50 human-labelled tasks confirms: routing precision of 100% on simple tasks (100% on complex tasks, 90% on medium tasks), 45.5% carbon reduction vs an always-large baseline, routing overhead P95 of 0.27ms, and 100% fallback reliability. The budget enforcement layer independently reduces carbon by 92.5% in CRITICAL state vs uncapped routing. Framework repository:  https://github.com/pretzelslab/sa1-carbon-inference-router

🔗 Provenance — このレコードを発見したソース

Zenodo https://zenodo.org/records/19934621first seen 2026-05-14 21:28:44 · last seen 2026-05-14 21:39:00

🔔 こうした論文の新着を逃したくない方はキーワードアラートに登録（無料・3キーワードまで）。

gxceed は公開メタデータに基づく研究支援データセットです。要約・翻訳・解説は AI 支援で生成されています。最終的な解釈・検証は利用者が原典資料に基づいて行うことを前提とします。