EXPLAINABILITY¶
What this section does:
We open the black box. We find out exactly why the model makes its predictions. We confirm that what the model learned is based on legitimate fraud signals; and we diagnose whether any of those signals carry hidden correlations with our proxy groups.
What is SHAP?
SHAP stands for SHapley Additive exPlanations. The name comes from game theory; a method for fairly dividing credit among players who contributed to a team's outcome.
Think of a restaurant that received a 9 out of 10 review. Four chefs contributed to the meal. Which chef made the biggest difference, and by how much? SHAP is the algorithm that answers that question fairly. Applied to AI: instead of chefs, we have features like amount or balance_diff_orig. Instead of a meal rating, we have a fraud prediction. SHAP tells us which features pushed each transaction toward a fraud prediction and which pushed it away, and by exactly how much.
A positive SHAP value means the feature pushed the prediction toward fraud. A negative value means it pushed the prediction toward legitimate.
Why this matters for the audit:
BoG CISD 2026 Annexure E §j(i) explicitly names SHAP as an acceptable explainability technique and mandates that Regulated Financial Institutions must be able to explain AI-driven decisions to affected customers. If a customer asks why their account was flagged, the institution must have a defensible answer.
For this audit, SHAP also serves a diagnostic purpose: it tells us whether the features driving predictions carry implicit correlations with our proxy groups. We flagged this risk for the feature amount_to_orig_balance in the Feature Engineering section. SHAP will confirm whether that concern was justified.
# ── SECTION 1: LOAD CHECKPOINT ───────────────────────────────────────────────
import pandas as pd
import joblib
import shap
import numpy as np
from scipy import stats
from sklearn.preprocessing import LabelEncoder
from scipy.stats import ks_2samp
from fairlearn.metrics import demographic_parity_difference, equalized_odds_difference
from sklearn.metrics import matthews_corrcoef, log_loss, confusion_matrix, precision_recall_curve, auc, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
bins_proxy = [-np.inf, 0.0, 50397.0, np.inf]
labels_proxy = ['Low-Balance', 'Mid-Balance', 'High-Balance']
df = pd.read_csv('checkpoint.csv')
print(f'Checkpoint loaded: {df.shape[0]:,} rows x {df.shape[1]} columns')
print(df.columns.tolist())
Checkpoint loaded: 6,362,620 rows x 11 columns ['step', 'type', 'amount', 'nameOrig', 'oldbalanceOrg', 'newbalanceOrig', 'nameDest', 'oldbalanceDest', 'newbalanceDest', 'isFraud', 'isFlaggedFraud']
model = joblib.load("xgb_model.pkl")
X_test = pd.read_csv('X_test.csv')
y_test = pd.read_csv('y_test.csv')
results = pd.read_csv('model_results.csv')
s_test = results[['balance_group', 'tx_type_group']].astype(str)
print("Model + data loaded")
print(X_test.shape)
Model + data loaded (1272524, 6)
FEATURES = [
'step', 'amount', 'oldbalanceOrg', 'newbalanceOrig',
'oldbalanceDest', 'newbalanceDest'
]
# ── SECTION 10: EXPLAINABILITY — SHAP ────────────────────────────────────────
print('Computing SHAP values on a sample of 2,000 test transactions...')
print('A sample is used because full-dataset SHAP on 1.27M rows would take hours.')
print('2,000 is statistically representative for feature importance analysis.\n')
shap_idx = np.random.choice(len(X_test), 2000, replace=False)
X_shap = X_test.iloc[shap_idx]
s_shap = s_test.iloc[shap_idx]
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_shap)
mean_shap = np.abs(shap_values).mean(axis=0)
top5_idx = np.argsort(mean_shap)[::-1][:5]
top5_feat = [FEATURES[i] for i in top5_idx]
print('Top 5 most influential features:')
for i, fi in enumerate(top5_idx):
print(f' {i+1}. {FEATURES[fi]:<30} mean absolute SHAP = {mean_shap[fi]:.4f}')
explanations = {
'amount_to_orig_balance': 'Fraction of account balance being moved. A fraudster draining a full account sends this ratio close to 1.0. NOTE: This feature is correlated with balance tier — see risk note in Section 5.',
'balance_diff_orig': 'How much the sender balance dropped. A complete drain is a red flag.',
'orig_balance_zeroed': 'Whether the sender account hit exactly zero. The clearest fraud signature in the data.',
'step': 'Time step of the transaction. Fraud clusters at specific hours.',
'amount': 'Raw transaction size. Very large amounts carry elevated fraud risk.',
'balance_diff_dest': 'How much the destination balance changed. Unusually large inflows to empty accounts are suspect.',
'hour_of_day': 'Hour of day derived from step. Temporal fraud patterns.',
'day_of_week': 'Day of week derived from step. Weekly fraud patterns.',
'dest_balance_zeroed': 'Whether the destination account was empty before receiving the transfer.',
'newbalanceOrig': 'Sender balance after transaction.',
'oldbalanceOrg': 'Sender balance before transaction.',
}
print('\nWhat each feature represents:')
for feat in top5_feat:
if feat in explanations:
print(f'\n {feat}:')
print(f' {explanations[feat]}')
Computing SHAP values on a sample of 2,000 test transactions...
A sample is used because full-dataset SHAP on 1.27M rows would take hours.
2,000 is statistically representative for feature importance analysis.
Top 5 most influential features:
1. newbalanceOrig mean absolute SHAP = 5.2540
2. oldbalanceOrg mean absolute SHAP = 4.9669
3. amount mean absolute SHAP = 2.2655
4. step mean absolute SHAP = 0.7908
5. oldbalanceDest mean absolute SHAP = 0.5503
What each feature represents:
newbalanceOrig:
Sender balance after transaction.
oldbalanceOrg:
Sender balance before transaction.
amount:
Raw transaction size. Very large amounts carry elevated fraud risk.
step:
Time step of the transaction. Fraud clusters at specific hours.
# ── SECTION 10: SHAP GLOBAL BAR CHART ────────────────────────────────────────
plt.figure(figsize=(10, 6))
shap.summary_plot(shap_values, X_shap, feature_names=FEATURES,
plot_type='bar', show=False)
plt.title(
'SHAP Global Feature Importance\n'
'Average absolute SHAP value — which features drive predictions most?\n'
'ClearBoxAI Audit CBA-2026-002',
fontsize=11, fontweight='bold', pad=15
)
plt.tight_layout()
plt.savefig('fig_shap_01_bar.png', dpi=150, bbox_inches='tight')
plt.show()
print('How to read this chart:')
print(' Longer bar = feature had more influence on predictions overall.')
print(' This is averaged across all 2,000 sampled test transactions.')
How to read this chart: Longer bar = feature had more influence on predictions overall. This is averaged across all 2,000 sampled test transactions.
# ── SECTION 10: SHAP BEESWARM ─────────────────────────────────────────────────
plt.figure(figsize=(10, 8))
shap.summary_plot(shap_values, X_shap, feature_names=FEATURES,
show=False, max_display=10)
plt.title(
'SHAP Beeswarm — Feature Impact for Each Individual Transaction\n'
'Red = high feature value pushed toward fraud | Blue = low pushed toward legitimate\n'
'ClearBoxAI Audit CBA-2026-002',
fontsize=10, fontweight='bold', pad=15
)
plt.tight_layout()
plt.savefig('fig_shap_02_beeswarm.png', dpi=150, bbox_inches='tight')
plt.show()
print('How to read this chart:')
print(' Each dot is one transaction.')
print(' Right side of x-axis = pushed toward fraud.')
print(' Left side = pushed toward legitimate.')
print(' Color: red = high feature value, blue = low feature value.')
How to read this chart: Each dot is one transaction. Right side of x-axis = pushed toward fraud. Left side = pushed toward legitimate. Color: red = high feature value, blue = low feature value.
# ── SECTION 10: SHAP PROXY SCATTER ────────────────────────────────────────────
fig, axes = plt.subplots(1, 3, figsize=(16, 5))
fig.suptitle(
'SHAP Proxy Scatter — Does the Model Behave Differently by Transaction Type?\n'
'Each dot is one transaction. CASH_OUT (red) vs OTHER (teal)\n'
'ClearBoxAI Audit CBA-2026-002',
fontsize=10, fontweight='bold'
)
type_colors = {'CASH_OUT': '#FF6B6B', 'OTHER': '#4ECDC4'}
type_labels = s_shap['tx_type_group'].astype(str).values
for ax, feat in zip(axes, top5_feat[:3]):
feat_idx = FEATURES.index(feat)
feat_vals = X_shap[feat].values
shap_feat = shap_values[:, feat_idx]
for g, c in type_colors.items():
mask = type_labels == g
ax.scatter(feat_vals[mask], shap_feat[mask],
color=c, alpha=0.25, s=6, label=g)
ax.axhline(0, color='black', lw=0.8, linestyle='--')
ax.set_xlabel(feat, fontsize=9)
ax.set_ylabel('SHAP value' if ax == axes[0] else '')
ax.set_title(feat, fontsize=9, fontweight='bold')
ax.legend(title='tx_type_group', markerscale=3, fontsize=8)
plt.tight_layout()
plt.savefig('fig_shap_03_proxy_scatter.png', dpi=150, bbox_inches='tight')
plt.show()
print('How to read these charts:')
print(' If red and teal dots overlap, the model treats both groups similarly for that feature.')
print(' If they cluster separately, the model uses the feature differently by group.')
How to read these charts: If red and teal dots overlap, the model treats both groups similarly for that feature. If they cluster separately, the model uses the feature differently by group.
[SHAP findings]¶
Result: Top features are legitimate fraud signals - with one critical caveat.
Top 5 features driving fraud predictions:
newbalanceOrig (mean SHAP = 5.2540) — The sender's account balance after the transaction completes. The model's single strongest signal. Legitimate fraud indicator.
oldbalanceOrg (mean SHAP = 4.9669) — The sender's account balance before the transaction begins. Together with newbalanceOrig, this captures account liquidation behaviour. Legitimate fraud indicator.
amount (mean SHAP = 2.2655) — Raw transaction size. Large amounts carry elevated fraud risk. Legitimate fraud indicator.
step (mean SHAP = 0.7908) — Time step of the transaction. Fraud clusters at specific hours. Legitimate fraud indicator.
oldbalanceDest (mean SHAP = 0.5503) — The recipient account balance before receiving the transfer. Empty destination accounts are sometimes staging accounts for fraud. Legitimate fraud indicator.
The critical caveat - the hidden wealth correlation:
The top two features are both raw balance values — what the sender had before and after the transaction. Together they measure account liquidation: how much money left, and whether anything remained.
For a High-Balance user with GHS 100,000, a GHS 50,000 transfer leaves GHS 50,000 behind. The model sees continuity. Low suspicion.
For a Low-Balance user with GHS 500, a GHS 500 transfer leaves GHS 0 behind. The model sees a full account drain — the same pattern it was trained to associate with fraud.
But as the Exploratory Data Analysis section confirmed, 56.68% of all legitimate transactions also result in a zero origin balance. Low-balance users who move their entire small balance look structurally identical to fraudsters at the feature level. The model needs enough Low-Balance fraud examples to learn that this pattern means something different in small accounts than in large ones. It had 35 training examples. It never learned that distinction.
This is the mechanism that connects the feature engineering section, the fairness failure in the fairness and Bias section, and the error analysis section into one coherent chain. The features are legitimate. The training data was insufficient for one group. The result is a model that cannot reliably contextualise its own signals for Low-Balance users.
This is documented here per BoG CISD 2026 Annexure E §j(ii), which requires logging and explanation of key influencing variables.
| Regulation | Provision | Status |
|---|---|---|
| BoG CISD 2026, Annexure E §j(i) | SHAP mandated as acceptable explainability technique | Complete |
| BoG CISD 2026, Annexure E §j(ii) | Key influencing variables logged and explained | Complete |
| NIST AI RMF 1.0, §3.5 | Explainable and Interpretable characteristic | Complete |
| NIST AI RMF 1.0, MEASURE 2.9 | Model explained, validated, and documented | Complete |
Risk Level: MEDIUM — Top features are legitimate. However, balance-derived features carry implicit wealth-tier correlation that compounds the EOD failure in the Fairness and Bias section.
Auditor: Kwadwo Amponsah, ClearBoxAI — April 2026
df.to_csv('checkpoint_v2.csv', index=False)
print("Checkpoint v2 saved.")
Checkpoint v2 saved.