A vectorized GX corpus
open to researchers and RAG workflows
gxceed maintains a continuously growing corpus of 1,227 GX research papers from 13 global sources — with multilingual AI summaries, SNE-axis classification, and Japan↔Global axis scoring. All papers are embedded with @cf/baai/bge-m3 (1,024-dim cosine) for semantic search.
The gxceed GX paper corpus is collected from 13 open scholarly metadata sources: arXiv, Jxiv (JST), Zenodo, SSRN, EarthArXiv, J-STAGE, CiNii Research, Research Square, OpenAlex, IEA, Carbon Brief, Nature Energy, and ChinaRxiv.
Each paper has: bilingual AI summaries (Japanese + English), GX relevance score (0–100), SNE-axis binary flags, sne_profile_hint, Japan↔Global axis scores, topic classification, country of origin, and — as of May 2026 — a 1,024-dim vector embedding for semantic search.
Dedicated resolvers for Jxiv (JST), J-STAGE, and CiNii Research bring Japanese-origin GX preprints and peer-reviewed papers into an English-accessible corpus. Most global databases under-represent Japanese-language GX research. gxceed translates titles and abstracts via DeepSeek AI and assigns Japan↔Global relevance scores.
ChinaRxiv (Chinese Academy of Sciences, ~23,000 papers) is accessible from Japan and provides Chinese-language GX preprints. Titles and abstracts are translated to English and Japanese. Chinese institutional papers via OpenAlex supplement coverage of peer-reviewed Chinese GX output.
Every published paper is tagged with SNE binary flags (S₁ / N / E / S₂ / W) and assigned an sne_profile_hint. This lets you distinguish measurement-heavy papers from policy-narrative-dominant papers from implementation-ready work — across 1,227 papers and 25 topic categories. No equivalent classification exists in OpenAlex or Semantic Scholar.
All papers are embedded with @cf/baai/bge-m3 — a multilingual model supporting Japanese, English, and Chinese in the same embedding space. You can query "GX transition risk in supply chains" in English and retrieve relevant Japanese-origin papers without keyword translation.
Retrieve the 5–10 most semantically relevant GX papers as context for an LLM. Use mode=vector&limit=10&min_score=80 to get high-quality, curated papers ordered by cosine similarity to your claim or question. The bilingual summaries (ai_summary_en) are compact enough to fit in context alongside original paper abstracts.
Search abstract concepts — "forest carbon sinks and corporate disclosure", "hydrogen supply chain cost reduction" — across 1,227 JP+EN+CN papers in a single query. Keyword search would miss papers where the concept is expressed differently in Japanese or where the abstract uses synonyms. Vector mode handles this natively.
Filter retrieved papers by sne_profile_hint to understand the knowledge production structure on a specific topic. For example: how does the renewable energy sub-corpus distribute across S₁ (measurement) vs N (narrative)? What fraction of hydrogen papers have S₂ (implementation substance) signals?
Find supporting papers for a specific technical claim across multilingual corpora. The Japan↔Global axis scores help prioritize papers that bridge Japanese implementation context with global scholarly discourse.
Query an underexplored topic to surface the closest existing research and identify gaps. Compare sne_profile_hint=N_heavy_S2_weak vs S2_capable distributions on a topic to locate where implementation substance is thin relative to policy narrative.
GET /api/papers/searchAuthentication: x-api-key: <key> header required. Contact [email protected] for researcher API access.
| Parameter | Default | Description |
|---|---|---|
q | required | Query string (2+ chars). Natural language or keyword. Embedded at query-time in vector mode. |
mode | keyword | vector = bge-m3 semantic similarity. keyword = LIKE-based AND across 11 text fields. |
topic | — | Filter by primary_topic (see topic list below). Applied as metadata filter in Vectorize (vector mode). |
min_score | 80 | Minimum GX relevance score (0–100). Applied post-retrieval on D1 results. |
limit | 20 | Max results returned. Max 50. |
lang | all | ja / en / all. Filter by paper language. |
shelf | all | curated / japan_to_global / global_to_japan / all. |
mode=vector)"query": "carbon pricing implementation",
"mode": "vector",
"count": 10,
"papers": [
{
"id": "...",
"title": "...",
"title_en": "...",
"ai_summary_en": "...",
"primary_topic": "carbon_pricing",
"sne_profile_hint": "S1_S2_mixed",
"draft_score": 87,
"vector_score": 0.842,
"doi": "10.xxx/yyy",
"source_url": "https://...",
"published_at": "2026-03-15T...",
"origin_country": "JP",
...SNE flags, context notes, shelf
}
]
}
vector_score is included only in mode=vector responses. For full schema documentation see llms-full.txt.
Embedding text per paper (pipe-delimited, max ~2,000 chars):
The query text is embedded at query-time using the same model. Vectorize returns top-K matches; full paper metadata is then retrieved from D1 via an IN query and re-sorted by similarity score.
Pass any of these as topic=<value> in the search API. Topic filter is applied as metadata filter in Vectorize (vector mode) or as a SQL WHERE clause (keyword mode).
API access for academic research and non-commercial use is available upon request. Contact [email protected] with:
- Your institution / affiliation (if any)
- Intended use case (RAG, bibliometrics, corpus analysis, etc.)
- Estimated query volume
If you are a researcher submitting a paper that should be indexed in the corpus: