www.fiddler.ai
Open in
urlscan Pro
34.251.201.224
Public Scan
Submitted URL: http://go.fiddler.ai/NTEzLVJQUS02OTkAAAGFeHjfW-PsyUJTh0ftduGIw2R82Dw6hmRonqeTl0f5bUfClcNmQpzvc9wFkZNX2N1UPecztWM=
Effective URL: https://www.fiddler.ai/blog/measuring-data-drift-population-stability-index?utm_medium=email&utm_source=mtko&utm_campai...
Submission: On July 08 via api from CH — Scanned from DE
Effective URL: https://www.fiddler.ai/blog/measuring-data-drift-population-stability-index?utm_medium=email&utm_source=mtko&utm_campai...
Submission: On July 08 via api from CH — Scanned from DE
Form analysis
2 forms found in the DOM<form id="mktoForm_1006" novalidate="novalidate" class="mktoForm mktoHasWidth mktoLayoutAbove" style="font-family: Helvetica, Arial, sans-serif; font-size: 13px; color: rgb(51, 51, 51); width: 161px;">
<style type="text/css">
.mktoForm .mktoButtonWrap.mktoSimple .mktoButton {
color: #fff;
border: 1px solid #75ae4c;
padding: 0.4em 1em;
font-size: 1em;
background-color: #99c47c;
background-image: -webkit-gradient(linear, left top, left bottom, from(#99c47c), to(#75ae4c));
background-image: -webkit-linear-gradient(top, #99c47c, #75ae4c);
background-image: -moz-linear-gradient(top, #99c47c, #75ae4c);
background-image: linear-gradient(to bottom, #99c47c, #75ae4c);
}
.mktoForm .mktoButtonWrap.mktoSimple .mktoButton:hover {
border: 1px solid #447f19;
}
.mktoForm .mktoButtonWrap.mktoSimple .mktoButton:focus {
outline: none;
border: 1px solid #447f19;
}
.mktoForm .mktoButtonWrap.mktoSimple .mktoButton:active {
background-color: #75ae4c;
background-image: -webkit-gradient(linear, left top, left bottom, from(#75ae4c), to(#99c47c));
background-image: -webkit-linear-gradient(top, #75ae4c, #99c47c);
background-image: -moz-linear-gradient(top, #75ae4c, #99c47c);
background-image: linear-gradient(to bottom, #75ae4c, #99c47c);
}
</style>
<div class="mktoFormRow">
<div class="mktoFieldDescriptor mktoFormCol" style="margin-bottom: 10px;">
<div class="mktoOffset" style="width: 10px;"></div>
<div class="mktoFieldWrap mktoRequiredField"><label for="Email" id="LblEmail" class="mktoLabel mktoHasWidth" style="width: 100px;">
<div class="mktoAsterix">*</div>
</label>
<div class="mktoGutter mktoHasWidth" style="width: 10px;"></div><input id="Email" name="Email" placeholder="Business Email" maxlength="255" aria-labelledby="LblEmail InstructEmail" type="email"
class="mktoField mktoEmailField mktoHasWidth mktoRequired" aria-required="true" style="width: 150px;"><span id="InstructEmail" tabindex="-1" class="mktoInstruction"></span>
<div class="mktoClear"></div>
</div>
<div class="mktoClear"></div>
</div>
<div class="mktoClear"></div>
</div>
<div class="mktoFormRow"><input type="hidden" name="utm_medium__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="email" style="margin-bottom: 10px;">
<div class="mktoClear"></div>
</div>
<div class="mktoFormRow"><input type="hidden" name="utm_source__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="mtko" style="margin-bottom: 10px;">
<div class="mktoClear"></div>
</div>
<div class="mktoFormRow"><input type="hidden" name="utm_campaign__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="06-2022-newsletter" style="margin-bottom: 10px;">
<div class="mktoClear"></div>
</div>
<div class="mktoFormRow"><input type="hidden" name="Most_Recent_Lead_Source_Program__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="EG-All-Email-Newsletter-Newsletter-Signups" style="margin-bottom: 10px;">
<div class="mktoClear"></div>
</div>
<div class="mktoFormRow"><input type="hidden" name="Most_Recent_Lead_Source__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="Web Direct" style="margin-bottom: 10px;">
<div class="mktoClear"></div>
</div>
<div class="mktoFormRow"><input type="hidden" name="Most_Recent_Lead_Source_Detail__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="Web Direct" style="margin-bottom: 10px;">
<div class="mktoClear"></div>
</div>
<div class="mktoFormRow"><input type="hidden" name="Most_Recent_Lead_Source_Campaign__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="Web Direct" style="margin-bottom: 10px;">
<div class="mktoClear"></div>
</div>
<div class="mktoButtonRow"><span class="mktoButtonWrap mktoSimple" style="margin-left: 0px;"><button type="submit" class="mktoButton">Subscribe</button></span></div><input type="hidden" name="formid" class="mktoField mktoFieldDescriptor"
value="1006"><input type="hidden" name="munchkinId" class="mktoField mktoFieldDescriptor" value="513-RPQ-699">
</form>
<form novalidate="novalidate" class="mktoForm mktoHasWidth mktoLayoutAbove"
style="font-family: Helvetica, Arial, sans-serif; font-size: 13px; color: rgb(51, 51, 51); visibility: hidden; position: absolute; top: -500px; left: -1000px; width: 1600px;"></form>
Text Content
Steer clear of these 7 MLOps myths to avoid making an “ML-Oops” Product Platform Capabilities Overview Key capabilities and benefits of model performance management Explainable AI Understand the ‘why’ and ‘how’ behind your models Build vs Buy Key considerations for buying an MPM solution Security Enterprise-grade security and compliance standards MLOps Optimize ML model deployment and experimentation More Model Monitoring Detect model drift, assess performance and integrity, and set alerts Analytics Connect predictions with context to business alignment and value Fairness Mitigate bias and build a responsible AI culture Request a demo and discover the value of MPM Use Cases Financial Services Accelerate AI adoption in financial services with Fiddler Governance Empowered AI Governance with Explainable Monitoring Fraud Revolutionize fraud detection with explainable AI Churn Detection Drive customer loyalty with explainable AI Underwriting Empowered underwriting with explainable AI Resources Learn Featured reads 8 ML Model Best Practices Download report The new 5-step approach to Model Governance for the modern enterprise Read more Resource Library Discover the latest reports, videos, and research Docs Get in-depth user guides and technical documentation Blog Read the latest blogs, product updates, and company news Reports Download our latest reports on model monitoring and AI explainability Connect Events Find out about upcoming events Become a Partner Learn more about our partner program Support Need help? Contact the Fiddler AI support team Amazon SageMaker + Fiddler End-to-end model lifecycle management Company About Our mission and who we are Careers We are hiring! Join Fiddler AI to build trustworthy and responsible AI solutions Featured news Fiddler AI is on a16z's inaugural Data50 list of the world's top 50 data startups Read analysis Newsroom Explore recent news and press releases Watch Demo Request Demo Watch Demo BACK TO BLOG HOME Data Science MEASURING DATA DRIFT: POPULATION STABILITY INDEX Murtuza Shergadwala May 16, 2022 What do you know about the Population Stability Index (PSI) measure, its historical usage, and its connection to other mathematical drift measures such as KL divergence? If you’re left scratching your head, don’t worry — we’ve got you covered! POPULATION STABILITY INDEX: WHAT IS IT? PSI is a commonly used measure in the financial services domain to quantify the shift in the distribution of a variable over time. While several resources give an overview of PSI, such as this visual blog by Matthew Burke and this paper summary [3], they often do not discuss the connection between PSI as a drift metric and other popular measures such as KL divergence. Figures from Matthew Burke’s blog: The left figure shows two distributions. The right figure shows the conversion of the distributions into ten bins that represent the proportion of the population within each bin. Briefly, PSI is calculated based on the multinomial classification of a variable into bins or categories. Consider two distributions shown in the left figure above. These distributions can be converted into their respective histograms with an appropriately chosen binning strategy. There are several binning strategies, and each strategy can yield varying PSI values. For the figure on the right, data is collected in equi-width bins. This produces a histogram that resembles a discretized version of the respective distribution. Another possible binning strategy is equi-quantiles or equi-depth binning. In this case, each bin would have the same proportion of samples in the reference / expected distribution. The choice of the strategy is context-specific and requires domain knowledge. For example, in credit score monitoring, credit scores are already binned into ranges representing a client's credit risk. In such cases, it may be desirable to use consistent binning throughout the analysis. The differences in each bin between the expected distribution (AKA reference or initial distribution) and the target distribution (AKA new or actual distribution) are then utilized to calculate PSI as follows: where, B is the total number of bins, ActualProp(b) is the proportion of counts within bin b from the target distribution and ExpectedProp(b) is the proportion of counts within bin b from the reference distribution. Thus, PSI is a number that ranges from zero to infinity and has a value of zero when the two distributions exactly match. Practical Notes: The rules of thumb in practice regarding PSI thresholds are that if: (1) PSI is less than 0.1, then the actual and the expected distributions are considered similar, (2) PSI is between 0.1 and 0.2, then the actual distribution is considered moderately different from the expected distribution, and (3) PSI is beyond 0.2, then it is highly advised to develop a new model on a more recent sample [1,2]. Also, since there is a possibility that a particular bin may be empty, PSI can be numerically undefined or unbounded. To avoid this, in practice, a small value such as 0.01 can be added to each bin proportion value. Alternatively, a base count of 1 can be added to each bin to ensure non-zero proportion values. PSI USAGE HISTORY PSI is typically used in financial services as a guidepost to compare current to baseline populations for which some financial tool or service was developed. For example, the use of credit scoring tools has proliferated in the banking industry to evaluate the level of credit risk associated with applicants or customers. Such tools provide statistical odds or probabilities that an applicant with a given credit score will pay off their credit. In the context of credit scoring, it is crucial to study the effects of changing populations or irregular trends in application approval rates. Similarly, abnormal periods where the population may under- or over-apply in line with regular business cycles are also important. PSI helps quantify such changes and provides a basis to the decision-makers that the development sample is representative of future expected applicants. Identifying distributional change can significantly impact the maintenance of tools capable of accurate lending decisions. While there are no explicit resources that we found on the rationale of using PSI, we conjecture that PSI usage stems from multiple factors as listed below: 1. Regulations such as Basel Accords and the International Financial Reporting Standards (IFRS 9) discuss assessing the risk of loans with three components: the probability of default (PD), exposure at default (EAD), and loss given default (LGD). Since PSI measures shifts in probability distributions, its usage for measuring shifts in PD seems likely due to such regulations. 2. PSI uses binning of variables, including numerical variables, which implies categorizing variables into bins. Despite being a numerical quantity, credit scores are typically categorized into bins in the financial sector. Such a practice also points towards the ease of usage of PSI within the industry. 3. The PSI metric may have been widely adopted due to its inclusion in popular software such as SAS® Enterprise Miner™. With the ongoing adoption of machine learning models and systems in financial services, PSI has gained popularity as a model monitoring metric — we only expect this trend to continue as model portfolios grow and the MLOps lifecycle becomes standardized within organizations. UNPACKING PSI FORMULA AS A FUNCTION OF KL DIVERGENCE The Kullback-Leibler divergence or relative entropy is a statistical distance measure that describes how one probability distribution is different from another. Given two discrete probability distributions A (actual), and E (expected) defined on the same probability space, KL divergence is defined as: An interpretation of KL divergence is that it measures the expected excess surprise in using the actual distribution versus the expected distribution as a divergence of the actual from the expected. This sounds a lot like the reasoning behind using PSI! While KL divergence is well studied in mathematical statistics [4] and has a lot of references to academic work [1,2], PSI is domain-specific and lacks concrete literature on the history of its usage within financial services. In the following, we illustrate how PSI can actually be viewed as a special form of KL divergence. Consider the PSI formula and let us look at the proportion of counts within a bin b for the actual distribution ActualProp(b) as the frequentist probability PA(b) of the variable appearing in that bin. The same applies to the expected distribution. Then, we can rewrite the PSI formula as: On expanding further, Thus, PSI can be rewritten as: which is the symmetrized KL divergence! We hope you enjoyed this overview of PSI. Don’t forget to check out our blog on detecting intersectional unfairness in AI! ——— References 1. Siddiqi, N. (2017). Intelligent credit scoring: Building and implementing better credit risk scorecards. John Wiley & Sons. 2. Yurdakul, B. (2018). Statistical properties of population stability index. Western Michigan University. 3. Lin, A. Z. (2017). Examining Distributional Shifts by Using Population Stability Index (PSI) for Model Validation and Diagnosis. SAS Conference Proceedings: Western Users of SAS Software 2017 September 20-22, 2017, Long Beach, California URL https://www.lexjansen.com/wuss/2017/47_Final_Paper_PDF.pdf 4. Kullback, S., & Leibler, R. A. (1951). On Information and Sufficiency. The Annals of Mathematical Statistics, 22(1), 79–86. http://www.jstor.org/stable/2236703 SUBSCRIBE TO OUR NEWSLETTER * Subscribe Product Product OverviewModel MonitoringExplainable AIAnalyticsFairnessMLOpsBuild vs BuySecurity Company About UsCareers We're hiring!NewsroomPartnersAmazon SageMaker + Fiddler Use cases Churn DetectionFinancial ServicesFraudGovernanceUnderwriting Resources MediaBlogDocsPodcastsReportsResearchVideosWhat is Model Monitoring? Connect EventsSubscribe to NewsletterContact SalesSupport Center © 2022 Fiddler AI. All rights reserved. Privacy PolicyTerms of Use