A live dashboard of frontier and open-weight AI model performance on capital-markets and ESG benchmarks: FinRetrieval, FinBen, ESGbench, ESG-Bench (AAAI 2026), ESGenius. Shows how APEXE3 Agents lift open-weight models.
| Configuration | Accuracy |
|---|---|
| APEXE3 agent + GLM-5.1 | 81.0% |
| GPT-5.2 + WebSearch + Reasoning | 70.8% |
| Gemini 3 Pro + WebSearch + Reasoning | 69.2% |
| Claude Opus 4.5 + WebSearch + Reasoning | 19.8% |
| Top issuer HQ | % | Question category | % |
|---|---|---|---|
| United States | 42 | Balance sheet | 20 |
| Japan | 10 | Cash flow | 19 |
| UK · Australia | 5 ea. | Operational KPIs | 18 |
| Brazil | 4 | Income statement | 16 |
| India · Canada | 3 ea. | Guidance / outlook | 15 |
| +34 others | 28 | Segments / geography | 13 |
| Benchmark | Domain | Scale | Status | Year | Why cite |
|---|---|---|---|---|---|
| Daloopa FinRetrieval | Finance · retrieval | 500 Qs · 14 configs | Fresh | 2026 | APEXE3 agent + GLM-5.1: 81.0% — beats GPT-5.2 70.8%, Opus 4.5 19.8%. PDF |
| FinBen | Finance · holistic | 42 datasets · 24 tasks | Peer-reviewed | 2024+ | FinOS / Linux Foundation — EU-credible |
| FinanceBench | Finance · QA | 150 Qs · 10-Ks | 2023 | Small but widely quoted | |
| FinBench | Finance · reasoning | — | Peer-reviewed | 2026 | Reasoning-heavy complement to FinBen |
| Open FinLLM Leaderboard | Finance · live | living | 2026 | FinOS-hosted real-time FinBen runs | |
| FinGPT | Finance · OSS | model family | 2024+ | OSS stack has overtaken BloombergGPT | |
| ESGbench | ESG · pipeline | configurable | Fresh | 2026 | Fork & run on your own corpus tonight |
| ESG-Bench | ESG · hallucination | human-annotated QA | AAAI 2026 | 2026 | Hallucination labels — regulator gold |
| ESGenius | ESG · MCQ | 1,136 MCQs · 231 docs | EMNLP 2025 | 2025 | 50 models 0.5B–671B tested |
| ESGReveal | ESG · extraction | — | Elsevier | 2024 | Peer-reviewed, EU-friendly |
| GHG Emission Extraction | ESG · Scope 1/2/3 | benchmark dataset | Nature | 2025 | Nature-published — top credibility |
| ESG Report Completeness | ESG · quality | — | 2025 | Topic + quality classification |
| # | Model | Provider | Benchmark | Score | Status | Date |
|---|