CounterFact CounterFact Lab
← Back

Single Policy Evaluation

Evaluate one candidate policy against historical logs

Evaluation Type: Single

Upload Data

CSV or Parquet format

CSV or Parquet format

Gating Thresholds (Advanced)

Minimum support overlap required. Lower = more permissive. Default: 20%

Minimum Effective Sample Size. Lower = more permissive. Default: 1000

Minimum uplift LCB to SHIP. Default: 0.005 (0.5%)

Uplift LCB below this triggers BLOCK. Default: 0.005 (0.5%)

Stress test gate: max fraction of exposures dropped before INCONCLUSIVE. Default: 30%

Note: These thresholds control decision gates. Adjust thresholds based on your risk tolerance and use case.
Propensity Estimation (Advanced)

Warning: Estimated propensities are less reliable than true propensities from your logging policy. CI and ESS results should be interpreted with caution. Use only if true propensities are unavailable.

Methods: softmax (if score/rating column exists) or uniform per-user. Auto-detects available columns.

Gate & Diagnostics

Run evaluation to see diagnostics