For Researchers

A vectorized GX corpus
open to researchers and RAG workflows

gxceed maintains a continuously growing corpus of 1,227 GX research papers from 13 global sources — with multilingual AI summaries, SNE-axis classification, and Japan↔Global axis scoring. All papers are embedded with @cf/baai/bge-m3 (1,024-dim cosine) for semantic search.

API Reference →Corpus Stats →Use Cases →Methodology →

Corpus Overview

1,227

Published papers

Collection sources

Languages (JP/EN/ZH)

Topic categories

1,024

Embedding dimensions

Daily

Collection frequency

The gxceed GX paper corpus is collected from 13 open scholarly metadata sources: arXiv, Jxiv (JST), Zenodo, SSRN, EarthArXiv, J-STAGE, CiNii Research, Research Square, OpenAlex, IEA, Carbon Brief, Nature Energy, and ChinaRxiv.

Each paper has: bilingual AI summaries (Japanese + English), GX relevance score (0–100), SNE-axis binary flags, sne_profile_hint, Japan↔Global axis scores, topic classification, country of origin, and — as of May 2026 — a 1,024-dim vector embedding for semantic search.

What Makes This Corpus Unique

🌏

Japanese GX coverage you won't find elsewhere

Dedicated resolvers for Jxiv (JST), J-STAGE, and CiNii Research bring Japanese-origin GX preprints and peer-reviewed papers into an English-accessible corpus. Most global databases under-represent Japanese-language GX research. gxceed translates titles and abstracts via DeepSeek AI and assigns Japan↔Global relevance scores.

🇨🇳

Chinese GX research via ChinaRxiv

ChinaRxiv (Chinese Academy of Sciences, ~23,000 papers) is accessible from Japan and provides Chinese-language GX preprints. Titles and abstracts are translated to English and Japanese. Chinese institutional papers via OpenAlex supplement coverage of peer-reviewed Chinese GX output.

🔬

SNE-axis classification on every paper

Every published paper is tagged with SNE binary flags (S₁ / N / E / S₂ / W) and assigned an sne_profile_hint. This lets you distinguish measurement-heavy papers from policy-narrative-dominant papers from implementation-ready work — across 1,227 papers and 25 topic categories. No equivalent classification exists in OpenAlex or Semantic Scholar.

🔍

Multilingual vector search

All papers are embedded with @cf/baai/bge-m3 — a multilingual model supporting Japanese, English, and Chinese in the same embedding space. You can query "GX transition risk in supply chains" in English and retrieve relevant Japanese-origin papers without keyword translation.

Researcher Use Cases

RAG / LLM grounding for GX topics

Retrieve the 5–10 most semantically relevant GX papers as context for an LLM. Use mode=vector&limit=10&min_score=80 to get high-quality, curated papers ordered by cosine similarity to your claim or question. The bilingual summaries (ai_summary_en) are compact enough to fit in context alongside original paper abstracts.

Cross-lingual literature discovery

Search abstract concepts — "forest carbon sinks and corporate disclosure", "hydrogen supply chain cost reduction" — across 1,227 JP+EN+CN papers in a single query. Keyword search would miss papers where the concept is expressed differently in Japanese or where the abstract uses synonyms. Vector mode handles this natively.

SNE corpus analysis

Filter retrieved papers by sne_profile_hint to understand the knowledge production structure on a specific topic. For example: how does the renewable energy sub-corpus distribute across S₁ (measurement) vs N (narrative)? What fraction of hydrogen papers have S₂ (implementation substance) signals?

Citation support for preprints

Find supporting papers for a specific technical claim across multilingual corpora. The Japan↔Global axis scores help prioritize papers that bridge Japanese implementation context with global scholarly discourse.

Knowledge gap identification

Query an underexplored topic to surface the closest existing research and identify gaps. Compare sne_profile_hint=N_heavy_S2_weak vs S2_capable distributions on a topic to locate where implementation substance is thin relative to policy narrative.

Search API Reference

GET /api/papers/search

Authentication: x-api-key: <key> header required. Contact [email protected] for researcher API access.

Parameter	Default	Description
`q`	required	Query string (2+ chars). Natural language or keyword. Embedded at query-time in vector mode.
`mode`	keyword	vector = bge-m3 semantic similarity. keyword = LIKE-based AND across 11 text fields.
`topic`	—	Filter by primary_topic (see topic list below). Applied as metadata filter in Vectorize (vector mode).
`min_score`	80	Minimum GX relevance score (0–100). Applied post-retrieval on D1 results.
`limit`	20	Max results returned. Max 50.
`lang`	all	ja / en / all. Filter by paper language.
`shelf`	all	curated / japan_to_global / global_to_japan / all.

Example requests

Semantic search — carbon pricing implementation across all languages

curl -H 'x-api-key: YOUR_KEY' 'https://gxceed.com/api/papers/search?q=carbon+pricing+implementation&mode=vector&limit=10'

Japan-to-Global hydrogen papers (semantic, last 50 results)

curl -H 'x-api-key: YOUR_KEY' 'https://gxceed.com/api/papers/search?q=hydrogen+cost+supply+chain&mode=vector&shelf=japan_to_global&topic=hydrogen&limit=20'

SNE S2-capable papers on climate risk (keyword mode)

curl -H 'x-api-key: YOUR_KEY' 'https://gxceed.com/api/papers/search?q=transition+risk&mode=keyword&topic=climate_risk&limit=20'

High-relevance English papers on scope 3

curl -H 'x-api-key: YOUR_KEY' 'https://gxceed.com/api/papers/search?q=scope3+supplier+engagement&mode=vector&topic=scope3&lang=en&min_score=85&limit=15'

Response shape (mode=vector)

{
  "query": "carbon pricing implementation",
  "mode": "vector",
  "count": 10,
  "papers": [
    {
      "id": "...",
      "title": "...",
      "title_en": "...",
      "ai_summary_en": "...",
      "primary_topic": "carbon_pricing",
      "sne_profile_hint": "S1_S2_mixed",
      "draft_score": 87,
      "vector_score": 0.842,
      "doi": "10.xxx/yyy",
      "source_url": "https://...",
      "published_at": "2026-03-15T...",
      "origin_country": "JP",
      ...SNE flags, context notes, shelf
    }
  ]
}

vector_score is included only in mode=vector responses. For full schema documentation see llms-full.txt.

Vector Search Technical Details

Embedding model@cf/baai/bge-m3 (Cloudflare Workers AI)

Dimensions1,024

Distance metricCosine similarity

IndexCloudflare Vectorize — gxceed-papers

LanguagesJapanese, English, Chinese (multilingual shared embedding space)

CoverageAll 1,227 published papers; new papers embedded at ingest

Embedding text per paper (pipe-delimited, max ~2,000 chars):

title | title_ja | title_en | ai_summary_ja[:500] | ai_summary_en[:500] | abstract[:500] | primary_topic | tags

The query text is embedded at query-time using the same model. Vectorize returns top-K matches; full paper metadata is then retrieved from D1 via an IN query and re-sorted by similarity score.

Topic Categories (primary_topic filter values)

scope3scope1_2carbon_pricingrenewablepolicytcfdsbtcdpccushydrogenclimate_financeclimate_scienceevenergy_transitionesgtransition_financegreenwashingclimate_riskbiodiversitycarbon_accountingdisclosure_infrastructureenergy_efficiencysupply_chainai_esgother

Pass any of these as topic=<value> in the search API. Topic filter is applied as metadata filter in Vectorize (vector mode) or as a SQL WHERE clause (keyword mode).

Getting Access

API access for academic research and non-commercial use is available upon request. Contact [email protected] with:

Your institution / affiliation (if any)
Intended use case (RAG, bibliometrics, corpus analysis, etc.)
Estimated query volume

If you are a researcher submitting a paper that should be indexed in the corpus:

Submit your paper (DOI) →Jxiv submission guide →

🔬 Full Methodology →📊 SNE Research Profile →📚 Browse Papers →About gxceed →[email protected]

A vectorized GX corpusopen to researchers and RAG workflows

A vectorized GX corpus
open to researchers and RAG workflows