Single Policy Evaluation
Evaluate one candidate policy against historical logs. Get detailed diagnostics, confidence intervals, and a comprehensive audit passport.
- Detailed single-policy diagnostics
- Gate & CI analysis
- Stress testing & robustness checks
- Full audit passport
Multi-Policy Comparison (Leaderboard)
Compare multiple candidate policies side-by-side. Rank by performance, compare pairwise differences, and identify the best policy.
- Rank multiple policies by uplift
- Side-by-side comparison
- Pairwise difference analysis
- Per-policy diagnostics
Candidate File Generation
Generate candidate policy files from your historical logs. Score items using uploaded scores, API calls, or ONNX models and create evaluation-ready files.
- Upload precomputed item scores
- Score via external API
- Score via ONNX model
- Download ready-to-evaluate files
Logging Readiness
Audit your historical logs before evaluation. Check whether your data supports reliable OPE.
- Auto-detect scenario (bandit vs ranking)
- Propensity health & coverage checks
- Availability & candidate-set logging audit
- Actionable recommendations