Draft:Cumulative Accuracy Profile
Submission declined on 31 May 2026 by EatingCarBatteries (talk). This draft appears to be a duplicate of an existing article. Wikipedia does not permit multiple articles on the same topic.
Where to get help
How to improve a draft
You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article. Improving your odds of a speedy review To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags. Editor resources
|
The Cumulative Accuracy Profile (CAP) is a graphical tool used in statistics and machine learning to evaluate the discriminatory power of binary classification models. It is widely utilized in fields such as credit risk modeling, fraud detection, and direct marketing to assess how effectively a predictive model can distinguish between two classes (e.g., "default" vs. "non-default", "fraud" vs. "legitimate").[1]
The CAP curve illustrates the cumulative percentage of positive outcomes captured as a function of the cumulative percentage of the total population, sorted by their model-predicted risk score.
Core components
A standard CAP diagram consists of three distinct curves plotted on a coordinate system where both the X and Y axes range from 0 to 100%.
- The X-Axis (Population): Represents the cumulative percentage of the population, sorted from the highest predicted probability of the target outcome to the lowest.
- The Y-Axis (Positives): Represents the cumulative percentage of actual positive outcomes (e.g., actual defaults) captured by selecting the given percentage of the population.
- The Random Model: Represented by a diagonal line extending from (0,0) to (100,100). This represents a model with zero predictive power, where the probability of selecting a positive outcome is the same as picking at random.
- The Perfect Model: Represented by a line that rises steeply from (0,0) to 100% of the positive outcomes, and then travels horizontally to (100,100). This represents a hypothetical model that identifies every positive outcome immediately, capturing 100% of the events using the minimum possible number of cases.
- The Proposed Model: The actual performance curve of the model being evaluated. It lies between the Random Model and the Perfect Model. A model with higher predictive power will have a curve that bows further toward the top-left corner of the graph.[2]
The Accuracy Ratio (AR)
The effectiveness of a model is quantified by the Accuracy Ratio (AR), often referred to in the context of credit risk as the Gini Coefficient. The AR measures the area between the proposed model curve and the random model line, normalized by the maximum possible area (the area between the perfect model and the random model).[3]
The formula for the Accuracy Ratio is defined as:
- AR = 0: The model has no discriminatory power (the model curve coincides with the random diagonal).
- AR = 1: The model is perfect (the model curve coincides with the perfect model line).
Comparison with ROC curves
The CAP curve is frequently compared to the Receiver operating characteristic (ROC) curve. While both are used for model validation, they differ in their construction and utility:
- ROC Curve: Plots the True Positive Rate (Sensitivity) against the False Positive Rate (1 - Specificity). It is generally independent of the prevalence (the proportion of positive cases in the population).
- CAP Curve: Plots the cumulative percentage of positive outcomes against the cumulative percentage of the population. Because it is sensitive to the population distribution, it is often more intuitive for business applications where the focus is on "how many positive cases do we catch if we target the top X% of our population?"[4]
Applications
- Credit Scoring: Banks use CAP curves to determine the percentage of loan defaults captured by reviewing a specific percentage of their highest-risk-score applicants.
- Fraud Detection: Financial institutions use the curve to decide what proportion of transactions to investigate to catch the majority of fraudulent activities.
- Direct Marketing: Businesses use the curve to identify the percentage of their customer base that should be targeted to maximize the conversion rate (capture the highest percentage of total buyers).
References
- ^ Engelmann, Bernd; Hayden, Evelyn; Tasche, Dirk. "Measuring the Discriminative Power of Rating Systems", Discussion Paper, Series 2: Banking and Financial Supervision, Deutsche Bundesbank, 2003.
- ^ Sobehart, Jorge; Keenan, Sean; Stein, Roger. "Validation methodologies for default risk models", Moody's Risk Management Services, May 2000.
- ^ Irwin, R. John; Irwin, Timothy C. "Appraising Credit Ratings: Does the CAP Fit Better than the ROC?", IMF Working Paper, 2012.
- ^ Engelmann, Bernd; Hayden, Evelyn; Tasche, Dirk. "Measuring the Discriminative Power of Rating Systems", Discussion Paper, Series 2: Banking and Financial Supervision, Deutsche Bundesbank, 2003.
Content Disclaimer
Informasi ini disarikan dari Wikipedia dan disajikan kembali untuk tujuan edukasi. Konten tersedia di bawah lisensi CC BY-SA 3.0. Kami tidak bertanggung jawab atas ketidakakuratan data yang bersumber dari kontribusi publik tersebut.
- The information displayed on this website is sourced in part or in whole from Wikipedia and has been adapted for the purpose of restating it. We strive to provide accurate and relevant information, however:
- There is no guarantee of absolute accuracy. Wikipedia is an open, collaborative project that can be edited by anyone, so information is subject to change.
- It is not intended to constitute professional advice. The content displayed is for informational and educational purposes only. For important decisions (e.g., medical, legal, or financial), please consult a professional.
- Content copyright. Wikipedia is licensed under the Creative Commons Attribution-ShareAlike License (CC BY-SA). This means that content may be reused with appropriate attribution and shared under a similar license.
- Responsible use. Any risk arising from the use of information from this website is entirely the responsibility of the user.

LLM-generated pages with certain obvious signs of being machine generated may be deleted without notice.
These tools are prone to specific issues that violate our policies:
Instead, only summarize in your own words a range of independent, reliable, published sources that discuss the subject.
See the advice page on large language models for more information.