🔍 ClearBoxAI Bias Audit

Mobile Money Fraud Detection — Is the AI System Fair to Everyone?


Audit IDCBA-2026-002AuditorKwadwo Amponsah (ClearBoxAI)DatasetPaySim Mobile Money Dataset ModelXGBoost Binary Classifier (Target: isFraud)DateApril 2026

Scope and Regulatory Alignment¶

Ghana lost GHS 346 million to mobile money fraud in 2023. Banks and fintechs are now deploying AI systems to catch fraudsters automatically. That sounds like progress. But it raises a question that most organisations are not asking:

When the AI learns how to spot fraud, whose fraud is it actually learning from?

This audit investigates that question.

Mobile money in Ghana serves a wide range of users. On one end, you have institutional accounts, large businesses, and high-value customers moving significant amounts regularly. On the other end, you have market traders, smallholder farmers, kayayei, and gig workers. Many of them have little or no stored balance. They receive money and move it quickly. Their transactions look very different from a corporate account.

The problem is this. Fraud detection models learn by studying examples. If the training data contains thousands of fraud cases from wealthy, high-balance accounts and almost none from low-balance informal users, the model becomes an expert at protecting wealthy, high-balance accounts. For low-balance informal users, it has barely any examples to learn from. It gets deployed on them anyway.

This is not a dramatic failure. It does not show up in an accuracy report. The model will still look like it is working. But quietly, a segment of users are receiving a much weaker level of protection, and nobody is measuring it.

A market trader in Kumasi using mobile money to collect daily payments deserves the same quality of fraud protection as a corporate customer making large transfers. If the AI is delivering different levels of protection to different groups, that is a fairness problem, even if nobody intended it.

That is exactly what this audit is designed to find.


About the Dataset¶

Real mobile money transaction data is almost impossible to obtain. Operators and banks protect it under strict confidentiality, privacy, and commercial agreements. This means independent researchers and auditors working outside of those institutions have no access to the actual transaction logs needed to run this kind of audit.

PaySim is the best available alternative. It is a mobile money dataset built from a sample of real transactions extracted from a mobile financial service operating across 14 African countries. The dataset was made available by researchers to study the statistical properties and fraud patterns of real African mobile money systems.

The dataset used in this audit contains 6.3 million transactions with the same transaction types, balance mechanics, and fraud signatures found in platforms like MTN Mobile Money, Telecel Cash, and AirtelTigo Money. The structural problems this audit identifies reflect the real-world data distribution that any Ghana-based mobile money fraud model would be trained on: fraud is rare, it concentrates in high-value accounts, and low-balance informal users are systematically underrepresented in the fraud signal.

The methodology demonstrated here is directly transferable to any live Ghanaian dataset the moment access becomes available.


Note on Structural Underrepresentation¶

The low fraud count for low-balance users in this dataset is not simply a matter of those users experiencing less fraud. Several structural factors reduce how many low-balance fraud cases appear in any training dataset:

Driver What Happens in Practice
Underreporting Informal users may not report fraud due to low trust in resolution, cost of following up, or belief small amounts will not be recovered
Attacker Incentives Fraudsters prioritise higher-balance accounts where potential returns are greater
Detection and Labelling Gaps Fraud must be detected and confirmed to enter training data, and existing systems are often weaker at identifying low-value patterns
Operational Prioritisation Institutions focus investigation resources on high-value cases, leaving smaller losses under-analysed
Data Capture Limitations Informal users often have thinner digital footprints, making patterns harder to trace and label

These factors mean the 41 low-balance fraud cases found in this dataset are not the true number of fraud events affecting low-balance users. They are the number that was successfully detected, labelled, and recorded. The actual number is almost certainly higher and unknown.

A model trained on this data is not just missing low-balance fraud patterns. It is missing them by a margin that is structurally biased against the most economically vulnerable users.


Auditor Decision Log: Proxy Variables¶

PaySim does not contain demographic information. There is no column for income, occupation, location, or social group. We cannot look at a row and know whether the person behind it is a market trader or a business executive.

So we construct proxy variables. These are observable patterns in the data that stand in for the types of users we care about. They are not perfect substitutes. But they are the best available tool for a fairness analysis on data that does not include protected attributes directly.

We build two proxies:

Proxy 1: Transaction Type as a proxy for economic role

Category Transaction Types Who This Represents
CASH_OUT CASH_OUT Informal economy users. Market traders, smallholder farmers, gig workers. Withdrawal-focused behaviour.
OTHER PAYMENT, TRANSFER, CASH_IN, DEBIT Broader mix of formal and informal activity. Payments, transfers, and account funding.

Proxy 2: Account Balance as a proxy for wealth

Tier Balance Range Who This Represents
Low-Balance GHS 0.00 Hand-to-mouth users. No stored balance. Often receive money and move it immediately.
Mid-Balance GHS 0.01 to 50,397 Small traders and everyday users with moderate activity.
High-Balance Above GHS 50,397 Institutional and high-value accounts with large, stable balances.

Rationale: BoG CISD 2026, Annexure E §e(i)(3) requires assessment of risks of bias, discrimination, and unfair outcomes based on protected or proxy attributes. NIST AI RMF §3.7 notes systemic bias can be present in AI datasets even when protected class variables are not present. This proxy construction is methodologically consistent with SR 11-7 model validation practice.


Regulatory Framework¶

Instrument Provision Relevance
BoG CISD 2026, Annexure E §e(i)(3) AI risk frameworks must explicitly assess fairness and ethics risks, including bias based on proxy attributes Defines the exact risk category this audit measures
BoG CISD 2026, Annexure E §g(iii)(3) Prior to deployment, AI models must undergo fairness testing to detect and measure bias across relevant demographic groups Directly mandates the testing this audit performs
BoG CISD 2026, Annexure E §l(i) Bias Mitigation: fairness metrics must be defined, monitored, and reported. If material bias is detected, model use must be suspended until mitigation is applied Establishes the regulatory consequence of a HIGH-RISK finding
BoG CISD 2026, §114(1)(i) Ethics, Fairness, and Regulatory Compliance is a mandatory area of every RFI AI governance framework Fairness is not optional, it is a required governance component
BoG CISD 2026, §115(2)(b) RFIs must notify BoG of AI incidents resulting in systemic bias or widespread customer harm Notification threshold triggered by this audit findings
BoG CISD 2026, §122(1) AI/ML model testing must include evaluations for adversarial robustness, model drift, bias, and data poisoning Scope of required testing this audit fulfils
EU AI Act 2024/1689, Annex III §5(b) AI systems evaluating creditworthiness are High-Risk, requiring fairness testing and bias controls Reference standard, credit scoring and fraud scoring share the same underlying methodology
EU AI Act 2024/1689, Article 6(3) Any AI system performing profiling of natural persons is always considered High-Risk A fraud detection model that scores users based on behavioural patterns meets this definition
NIST AI RMF 1.0, MEASURE 2.11 Fairness and bias are evaluated and results are documented Measurement framework
NIST AI RMF 1.0, §3.7 Fair, with Harmful Bias Managed as a trustworthiness characteristic Normative standard

ClearBoxAI Internal Standard: Threshold for unfair impact: |SPD| greater than or equal to 0.10 or |EOD| greater than or equal to 0.10 triggers a HIGH-RISK finding.


The Regulatory Gap This Audit Addresses¶

The BoG CISD 2026 is clear. Every Regulated Financial Institution deploying an AI fraud detection system is required to conduct fairness testing before deployment and at periodic intervals thereafter. Fairness metrics must be defined, monitored, and reported. If material bias is found, model use must be suspended.

These are not aspirational guidelines. They are binding obligations under Annexure E of the CISD, effective now.

However, according to the Ghana AI Summit newsletter published on April 16th, 2026 on its website (www.ghanaaisummit.com ), titiled: “3 Reasons African Mobile Money Programs Have a $1.7 Trillion AI Problem,” the challenges are far more structural than they appear. No independent published audit of false positive rates and fairness in detection, broken down by user type, exists for any major African mobile money AI fraud system. The required testing is either not being conducted, or if it is, the results are not being disclosed.

This audit exists to demonstrate what that required testing looks like, what it finds, and why it matters for Ghana's mobile money users.