ESG-DocQA: A Three-Reviewer Validated Dataset for Evidence-Grounded Question Answering over Corporate ESG Reports

ESG-DocQA: 企業ESG報告書における根拠に基づく質問応答のための3名のレビュアーによる検証済みデータセット (AI 翻訳)

Huajian Jiang

Zenodo (CERN European Organization for Nuclear Research)データセット2026-05-30#ESG

DOI: 10.5281/zenodo.20457993

原典: https://doi.org/10.5281/zenodo.20457993

🤖 gxceed AI 要約

日本語

本研究は、企業のESG報告書を対象とした根拠に基づく質問応答ベンチマーク「ESG-DocQA」を提案する。300サンプルから成り、複数ステップの検証・比較・推論問題を含む。3名のレビュアーによる検証で高い信頼性（Fleiss' kappa=0.644）を達成し、データセットと再現性スクリプトを公開している。

English

This paper presents ESG-DocQA, a 300-sample benchmark for evidence-grounded question answering over corporate ESG reports. It includes multi-step verification, comparison, and inference questions, and was validated by three reviewers with substantial inter-reviewer reliability (Fleiss' kappa = 0.644). The dataset, metadata, and reproducibility scripts are publicly available.

Unofficial AI-generated summary based on the public title and abstract. Not an official translation.

📝 gxceed 編集解説 — Why this matters

日本のGX文脈において

日本ではESG情報開示の重要性が高まっているが、本データセットは質問応答の評価に特化しており、直接的な開示実務への応用は限定的。ただし、AIによるESG分析の信頼性向上に寄与する可能性がある。

In the global GX context

This dataset contributes to the global need for structured ESG information extraction and QA systems. While not directly about disclosure frameworks like TCFD or ISSB, it provides a benchmark that can support the development of tools for analyzing corporate ESG reports, which is relevant to the growing demand for automated ESG assessment in global markets.

👥 読者別の含意

🔬研究者:A benchmark for evidence-grounded QA over ESG reports, useful for evaluating NLP models on ESG-specific tasks.

🏢実務担当者:May inform the development of AI tools to extract and verify ESG information from reports, but direct corporate use is limited.

📄 Abstract（原文）

ESG-DocQA is a 300-sample benchmark for evidence-grounded question answering over corporate environmental, social, and governance (ESG) reports. The dataset was constructed from page-level ESG report evidence and contains multi-step verification, comparison, and inference questions. The benchmark was produced through iterative human review and three-reviewer validation, achieving substantial inter-reviewer reliability (Fleiss' kappa = 0.644; Krippendorff's alpha = 0.647). The final dataset contains 300 validated samples with domain distribution E=145, S=91, G=64 and answer-type distribution verification=127, comparison=106, inference=67. The repository includes benchmark JSONL records, data dictionary and metadata, annotation guidelines, validation reports, adjudication and replacement logs, and reproducibility scripts. Original ESG report source PDFs and rendered page images are not redistributed due to copyright considerations. Users can locate source reports using the provided source-report manifest and page-level metadata.

🔗 Provenance — このレコードを発見したソース

openalex https://doi.org/10.5281/zenodo.20457993first seen 2026-06-02 04:53:44 · last seen 2026-06-16 04:49:20

🔔 こうした論文の新着を逃したくない方はキーワードアラートに登録（無料・3キーワードまで）。

gxceed は公開メタデータに基づく研究支援データセットです。要約・翻訳・解説は AI 支援で生成されています。最終的な解釈・検証は利用者が原典資料に基づいて行うことを前提とします。