Energy and Carbon Footprint of Vision-Language-Action Model Inference for Edge Robotic Systems

エッジロボットシステム向けVision-Language-Actionモデル推論のエネルギーとカーボンフットプリント (AI 翻訳)

Arshia Eslami, Mahsa Ardakani, Amin Roostaee, Hasti Zanganeh, Ramtin Zand

📚 査読済 / ジャーナル2026-06-19#AI×ESGOrigin: US経営インパクト: コスト削減対象セクター: robotics

原典: https://doi.org/10.1145/3797248.3815411

🤖 gxceed AI 要約

日本語

本研究では、エッジロボットシステム向けのVision-Language-Action（VLA）モデルの推論におけるエネルギー消費と二酸化炭素排出量を分析した。NVIDIA Jetson AGX Orin上でπ0.5、X-VLA、SmolVLAを評価し、タスク性能、レイテンシ、消費電力、推論あたりのエネルギー、推定炭素排出量を報告。X-VLAは高精度だが高消費電力、SmolVLAは効率的だが低精度。エネルギー効率と精度のトレードオフを示し、連続運用時の年間炭素排出量を推定し、エネルギー意識設計の重要性を強調している。

English

This paper analyzes the energy consumption and carbon emissions of Vision-Language-Action (VLA) model inference for edge robotic systems. Evaluating π0.5, X-VLA, and SmolVLA on an NVIDIA Jetson AGX Orin, it reports task performance, latency, power consumption, energy per inference, and estimated carbon emissions. X-VLA achieves highest accuracy but highest energy use, while SmolVLA is most efficient but lower accuracy. The study highlights the trade-off between accuracy and energy efficiency and estimates significant annual operational carbon emissions at scale, underscoring the need for energy-aware design.

Unofficial AI-generated summary based on the public title and abstract. Not an official translation.

📝 gxceed 編集解説 — Why this matters

日本のGX文脈において

日本はロボット工学とAIの分野で世界をリードしており、エッジロボットシステムにおけるAIモデルのエネルギー効率は、カーボンニュートラル目標達成に向けた重要な要素です。本論文は、VLAモデルのエネルギー消費と炭素排出を定量化し、エネルギー意識設計の必要性を強調しています。これは、日本のロボットメーカーやAI研究者にとって実用的な知見を提供します。

In the global GX context

Globally, as AI models are increasingly deployed on edge devices, their energy consumption and carbon footprint become critical for sustainability. This paper provides empirical data on the trade-offs between model accuracy and energy efficiency, which is essential for designing environmentally responsible AI systems. The findings are relevant for developers and companies aiming to reduce the carbon footprint of their AI operations.

👥 読者別の含意

🔬研究者:This paper offers empirical energy and carbon emission data for VLA model inference on edge devices, useful for researchers working on energy-efficient AI and sustainable robotics.

🏢実務担当者:Robotics engineers can use these findings to select models that meet performance requirements while minimizing energy consumption and carbon emissions for battery-constrained applications.

📄 Abstract（原文）

Vision-Language-Action (VLA) models are increasingly viewed as a promising foundation for embodied intelligence, yet energy consumption and latency remain among the main bottlenecks to their deployment on robotic platforms. In this work, we analyze the practical efficiency of representative VLA models under edge deployment conditions on an NVIDIA Jetson AGX Orin. We evaluate π0.5, X-VLA, and SmolVLA using a standardized robotics pipeline and report task performance, inference latency, power consumption, energy per inference, and estimated carbon emissions. Our results show a clear trade-off between accuracy and energy consumption: X-VLA achieves the highest task performance but also the highest latency and energy usage, while SmolVLA is the most efficient model but yields lower task success. All three models satisfy basic 10 Hz control requirements, but only π0.5 and SmolVLA meet 20 Hz constraints. We also observe bursty power behavior during inference, highlighting the dynamic load imposed by transformer-based robotic policies. Finally, we estimate annual operational carbon emissions under continuous deployment and find that even modest differences in inference energy translate into measurable environmental impact at scale. These findings underscore that energy-aware design is essential for deploying VLA systems in real-world, battery-constrained robotic applications.

🔗 Provenance — このレコードを発見したソース

openalex https://doi.org/10.1145/3797248.3815411first seen 2026-06-23 05:42:50

🔔 こうした論文の新着を逃したくない方はキーワードアラートに登録（無料・3キーワードまで）。

gxceed は公開メタデータに基づく研究支援データセットです。要約・翻訳・解説は AI 支援で生成されています。最終的な解釈・検証は利用者が原典資料に基づいて行うことを前提とします。