SETUP, THRESHOLDS AND ENVIRONMENT¶
What this section does:
Before looking at a single row of data, we do three things:
First, we load every library the audit will need. This ensures every subsequent section runs without import errors.
Second, we set pass/fail thresholds for every metric we will measure. This is non-negotiable. Setting thresholds after seeing results is how bias gets hidden in audits. We define what counts as passing before we know whether we pass.
Third, we set a reproducibility seed. Anyone who runs this notebook with the same seed on the same data should get identical results. This is what makes the audit independently verifiable.
In [1]:
# ── SECTION 1: SETUP ─────────────────────────────────────────────────────────
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import seaborn as sns
from datetime import datetime
from scipy import stats
from scipy.stats import ks_2samp
# Machine Learning
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import (
matthews_corrcoef, log_loss,
precision_recall_curve, auc,
confusion_matrix, classification_report
)
from xgboost import XGBClassifier
from imblearn.over_sampling import SMOTE
# Fairness
from fairlearn.metrics import (
demographic_parity_difference,
equalized_odds_difference
)
# Explainability
import shap
# ── REPRODUCIBILITY ───────────────────────────────────────────────────────────
# A fixed seed means anyone running this notebook gets identical results.
# Same seed, same data, same outputs. No variance between runs.
RANDOM_STATE = 42
np.random.seed(RANDOM_STATE)
# ── PASS/FAIL THRESHOLDS ──────────────────────────────────────────────────────
# Set before any results are seen.
# [ClearBoxAI Standard] = internal rule
# [BoG CISD 2026] = Bank of Ghana binding requirement
THRESHOLDS = {
'SPD': 0.10, # Fairness [ClearBoxAI Standard]
'EOD': 0.10, # Fairness [ClearBoxAI Standard]
'MCC': 0.50, # Performance [ClearBoxAI Standard]
'LOG_LOSS': 0.40, # Performance [ClearBoxAI Standard]
'PR_AUC': 0.70, # Performance [ClearBoxAI Standard]
'KS': 0.30, # Performance [Ghana banking industry standard]
}
print('=' * 58)
print(' ClearBoxAI Fraud Detection Audit Thresholds')
print(f' Audit ID : CBA-2026-002')
print(f' Run Date : {datetime.now().strftime("%Y-%m-%d %H:%M")}')
print('=' * 58)
print('\nPass/Fail Thresholds (defined before any results are seen):')
for k, v in THRESHOLDS.items():
print(f' {k:<12} threshold = {v}')
print('\nSetup complete. Ready to begin audit.')
========================================================== ClearBoxAI Fraud Detection Audit Thresholds Audit ID : CBA-2026-002 Run Date : 2026-04-20 19:09 ========================================================== Pass/Fail Thresholds (defined before any results are seen): SPD threshold = 0.1 EOD threshold = 0.1 MCC threshold = 0.5 LOG_LOSS threshold = 0.4 PR_AUC threshold = 0.7 KS threshold = 0.3 Setup complete. Ready to begin audit.