Enhancing the Extraction of GHG Emission-Reduction Targets from Sustainability Reports Using Vision Language Models

ビジョン言語モデルを用いたサステナビリティ報告書からのGHG排出削減目標抽出の高度化 (AI 翻訳)

Lars Wilhelmi, Christian Bruns, Matthias Schumann

Machine Learning and Knowledge Extraction📚 査読済 / ジャーナル2026-02-05#AI×ESG

DOI: 10.3390/make8020037

原典: https://doi.org/10.3390/make8020037

🤖 gxceed AI 要約

日本語

本研究では、ビジョン言語モデル（VLM）を用いてサステナビリティ報告書からESG指標、特にGHG排出削減目標を抽出する手法を検討した。デザインサイエンスリサーチ手法に基づき、ページレベルのデータセットと評価パイプラインを構築し、テキスト、画像、およびテキスト+画像の組み合わせ入力モダリティを比較した。その結果、Mistral Small 3.2モデルを使用したテキスト+画像モダリティが最高性能（F1=0.82）を示した。視覚的に密なレイアウトや推論に基づく幻覚には課題が残るが、VLMによるESG指標抽出の有効性が実証された。

English

This study investigates the use of Vision Language Models (VLMs) to extract ESG metrics, particularly GHG emission-reduction targets, from corporate sustainability reports. Using Design Science Research Methodology, we developed an extraction artifact with a curated page-level dataset and evaluation pipeline. Comparing text, image, and combined text+image modalities, the combined approach using Mistral Small 3.2 achieved the best performance (F1=0.82). The findings highlight the potential of VLMs for automated ESG data extraction, though challenges remain with visually dense layouts and inference-based hallucinations.

Unofficial AI-generated summary based on the public title and abstract. Not an official translation.

📝 gxceed 編集解説 — Why this matters

日本のGX文脈において

日本ではSSBJ基準の公表に伴い、企業のサステナビリティ報告書におけるGHG排出削減目標の開示が重要性を増している。本論文が提案するVLMを用いた自動抽出手法は、日本企業の開示対応の効率化に貢献し得る。特に有報や統合報告書への適用が期待される。

In the global GX context

This paper advances the automation of ESG metric extraction, directly supporting global disclosure frameworks like ISSB, TCFD, and CSRD. By integrating visual and textual cues, the proposed VLM-based method improves accuracy on complex report layouts, reducing manual effort and enabling more consistent data collection for transition finance and carbon accounting.

👥 読者別の含意

🔬研究者:Demonstrates a novel application of VLMs for ESG metric extraction, with insights on modality fusion and evaluation pipelines.

🏢実務担当者:Provides a method to automate the extraction of GHG targets from sustainability reports, improving efficiency and data consistency.

🏛政策担当者:Highlights the potential for AI-based tools to enhance the standardization and verification of ESG disclosures, supporting regulatory oversight.

📄 Abstract（原文）

This study investigates how Vision Language Models (VLMs) can be used and methodically configured to extract Environmental, Social, and Governance (ESG) metrics from corporate sustainability reports, addressing the limitations of existing text-only and manual ESG data-extraction approaches. Using the Design Science Research Methodology, we developed an extraction artifact comprising a curated page-level dataset containing greenhouse gas (GHG) emission-reduction targets, an automated evaluation pipeline, model and text-preprocessing comparisons, and iterative prompt and few-shot refinement. Pages from oil and gas sustainability reports were processed directly by VLMs to preserve visual–textual structure, enabling a controlled comparison of text, image, and combined input modalities, with extraction quality assessed at page and attribute level using F1-scores. Among tested models, Mistral Small 3.2 demonstrated the most stable performance and was used to evaluate image, text, and combined modalities. Combined text + image modality performed best (F1 = 0.82), particularly on complex page layouts. The findings demonstrate how to effectively integrate visual and textual cues for ESG metric extraction with VLMs, though challenges remain for visually dense layouts and avoiding inference-based hallucinations.

🔗 Provenance — このレコードを発見したソース

semanticscholar https://doi.org/10.3390/make8020037first seen 2026-05-05 21:50:01 · last seen 2026-06-21 05:36:17

🔔 こうした論文の新着を逃したくない方はキーワードアラートに登録（無料・3キーワードまで）。

gxceed は公開メタデータに基づく研究支援データセットです。要約・翻訳・解説は AI 支援で生成されています。最終的な解釈・検証は利用者が原典資料に基づいて行うことを前提とします。