www.fiddler.ai Open in urlscan Pro
34.251.201.224  Public Scan

Submitted URL: http://go.fiddler.ai/NTEzLVJQUS02OTkAAAGFeHjfW-PsyUJTh0ftduGIw2R82Dw6hmRonqeTl0f5bUfClcNmQpzvc9wFkZNX2N1UPecztWM=
Effective URL: https://www.fiddler.ai/blog/measuring-data-drift-population-stability-index?utm_medium=email&utm_source=mtko&utm_campai...
Submission: On July 08 via api from CH — Scanned from DE

Form analysis 2 forms found in the DOM

<form id="mktoForm_1006" novalidate="novalidate" class="mktoForm mktoHasWidth mktoLayoutAbove" style="font-family: Helvetica, Arial, sans-serif; font-size: 13px; color: rgb(51, 51, 51); width: 161px;">
  <style type="text/css">
    .mktoForm .mktoButtonWrap.mktoSimple .mktoButton {
      color: #fff;
      border: 1px solid #75ae4c;
      padding: 0.4em 1em;
      font-size: 1em;
      background-color: #99c47c;
      background-image: -webkit-gradient(linear, left top, left bottom, from(#99c47c), to(#75ae4c));
      background-image: -webkit-linear-gradient(top, #99c47c, #75ae4c);
      background-image: -moz-linear-gradient(top, #99c47c, #75ae4c);
      background-image: linear-gradient(to bottom, #99c47c, #75ae4c);
    }

    .mktoForm .mktoButtonWrap.mktoSimple .mktoButton:hover {
      border: 1px solid #447f19;
    }

    .mktoForm .mktoButtonWrap.mktoSimple .mktoButton:focus {
      outline: none;
      border: 1px solid #447f19;
    }

    .mktoForm .mktoButtonWrap.mktoSimple .mktoButton:active {
      background-color: #75ae4c;
      background-image: -webkit-gradient(linear, left top, left bottom, from(#75ae4c), to(#99c47c));
      background-image: -webkit-linear-gradient(top, #75ae4c, #99c47c);
      background-image: -moz-linear-gradient(top, #75ae4c, #99c47c);
      background-image: linear-gradient(to bottom, #75ae4c, #99c47c);
    }
  </style>
  <div class="mktoFormRow">
    <div class="mktoFieldDescriptor mktoFormCol" style="margin-bottom: 10px;">
      <div class="mktoOffset" style="width: 10px;"></div>
      <div class="mktoFieldWrap mktoRequiredField"><label for="Email" id="LblEmail" class="mktoLabel mktoHasWidth" style="width: 100px;">
          <div class="mktoAsterix">*</div>
        </label>
        <div class="mktoGutter mktoHasWidth" style="width: 10px;"></div><input id="Email" name="Email" placeholder="Business Email" maxlength="255" aria-labelledby="LblEmail InstructEmail" type="email"
          class="mktoField mktoEmailField mktoHasWidth mktoRequired" aria-required="true" style="width: 150px;"><span id="InstructEmail" tabindex="-1" class="mktoInstruction"></span>
        <div class="mktoClear"></div>
      </div>
      <div class="mktoClear"></div>
    </div>
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="utm_medium__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="email" style="margin-bottom: 10px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="utm_source__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="mtko" style="margin-bottom: 10px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="utm_campaign__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="06-2022-newsletter" style="margin-bottom: 10px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="Most_Recent_Lead_Source_Program__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="EG-All-Email-Newsletter-Newsletter-Signups" style="margin-bottom: 10px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="Most_Recent_Lead_Source__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="Web Direct" style="margin-bottom: 10px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="Most_Recent_Lead_Source_Detail__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="Web Direct" style="margin-bottom: 10px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="Most_Recent_Lead_Source_Campaign__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="Web Direct" style="margin-bottom: 10px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoButtonRow"><span class="mktoButtonWrap mktoSimple" style="margin-left: 0px;"><button type="submit" class="mktoButton">Subscribe</button></span></div><input type="hidden" name="formid" class="mktoField mktoFieldDescriptor"
    value="1006"><input type="hidden" name="munchkinId" class="mktoField mktoFieldDescriptor" value="513-RPQ-699">
</form>

<form novalidate="novalidate" class="mktoForm mktoHasWidth mktoLayoutAbove"
  style="font-family: Helvetica, Arial, sans-serif; font-size: 13px; color: rgb(51, 51, 51); visibility: hidden; position: absolute; top: -500px; left: -1000px; width: 1600px;"></form>

Text Content

Steer clear of these 7 MLOps myths to avoid making an “ML-Oops”

Product
Platform Capabilities
Overview
Key capabilities and benefits of model performance management
Explainable AI
Understand the ‘why’ and ‘how’ behind your models
Build vs Buy
Key considerations for buying an MPM solution
Security
Enterprise-grade security and compliance standards
MLOps
Optimize ML model deployment and experimentation
More
Model Monitoring
Detect model drift, assess performance and integrity, and set alerts
Analytics
Connect predictions with context to business alignment and value
Fairness
Mitigate bias and build a responsible AI culture
Request a demo and discover the value of MPM

Use Cases
Financial Services
Accelerate AI adoption in financial services with Fiddler
Governance
Empowered AI Governance with Explainable Monitoring
Fraud
Revolutionize fraud detection with explainable AI
Churn Detection
Drive customer loyalty with explainable AI
Underwriting
Empowered underwriting with explainable AI
Resources
Learn
Featured reads
8 ML Model Best Practices
Download report
The new 5-step approach to Model Governance for the modern enterprise
Read more
Resource Library
Discover the latest reports, videos, and research
Docs
Get in-depth user guides and technical documentation
Blog
Read the latest blogs, product updates, and company news
Reports
Download our latest reports on model monitoring and AI explainability
Connect
Events
Find out about upcoming events
Become a Partner
Learn more about our partner program
Support
Need help? Contact the Fiddler AI
support team
Amazon SageMaker + Fiddler
End-to-end model lifecycle management
Company
About
Our mission and who we are
Careers We are hiring!
Join Fiddler AI to build trustworthy and responsible AI solutions
Featured news
Fiddler AI is on a16z's inaugural Data50 list of the world's top 50 data
startups
Read analysis
Newsroom
Explore recent news and press releases
Watch Demo
Request Demo
Watch Demo





BACK TO BLOG HOME

Data Science


MEASURING DATA DRIFT: POPULATION STABILITY INDEX

Murtuza Shergadwala
May 16, 2022

What do you know about the Population Stability Index (PSI) measure, its
historical usage, and its connection to other mathematical drift measures such
as KL divergence? If you’re left scratching your head, don’t worry — we’ve got
you covered!


POPULATION STABILITY INDEX: WHAT IS IT?

PSI is a commonly used measure in the financial services domain to quantify the
shift in the distribution of a variable over time. While several resources give
an overview of PSI, such as this visual blog by Matthew Burke and this paper
summary [3], they often do not discuss the connection between PSI as a drift
metric and other popular measures such as KL divergence.


Figures from Matthew Burke’s blog: The left figure shows two distributions. The
right figure shows the conversion of the distributions into ten bins that
represent the proportion of the population within each bin.

Briefly, PSI is calculated based on the multinomial classification of a variable
into bins or categories. Consider two distributions shown in the left figure
above. These distributions can be converted into their respective histograms
with an appropriately chosen binning strategy. There are several binning
strategies, and each strategy can yield varying PSI values. For the figure on
the right, data is collected in equi-width bins. This produces a histogram that
resembles a discretized version of the respective distribution. Another possible
binning strategy is equi-quantiles or equi-depth binning. In this case, each bin
would have the same proportion of samples in the reference / expected
distribution. The choice of the strategy is context-specific and requires domain
knowledge. For example, in credit score monitoring, credit scores are already
binned into ranges representing a client's credit risk. In such cases, it may be
desirable to use consistent binning throughout the analysis.

The differences in each bin between the expected distribution (AKA reference or
initial distribution) and the target distribution (AKA new or actual
distribution) are then utilized to calculate PSI as follows: 



where, B is the total number of bins, ActualProp(b) is the proportion of counts
within bin b from the target distribution and ExpectedProp(b) is the proportion
of counts within bin b from the reference distribution. Thus, PSI is a number
that ranges from zero to infinity and has a value of zero when the two
distributions exactly match.

Practical Notes: The rules of thumb in practice regarding PSI thresholds are
that if: (1) PSI is less than 0.1, then the actual and the expected
distributions are considered similar, (2) PSI is between 0.1 and 0.2, then the
actual distribution is considered moderately different from the expected
distribution, and (3) PSI is beyond 0.2, then it is highly advised to develop a
new model on a more recent sample [1,2]. Also, since there is a possibility that
a particular bin may be empty, PSI can be numerically undefined or unbounded. To
avoid this, in practice, a small value such as 0.01 can be added to each bin
proportion value. Alternatively, a base count of 1 can be added to each bin to
ensure non-zero proportion values.


PSI USAGE HISTORY

PSI is typically used in financial services as a guidepost to compare current to
baseline populations for which some financial tool or service was developed. For
example, the use of credit scoring tools has proliferated in the banking
industry to evaluate the level of credit risk associated with applicants or
customers. Such tools provide statistical odds or probabilities that an
applicant with a given credit score will pay off their credit. In the context of
credit scoring, it is crucial to study the effects of changing populations or
irregular trends in application approval rates. Similarly, abnormal periods
where the population may under- or over-apply in line with regular business
cycles are also important. PSI helps quantify such changes and provides a basis
to the decision-makers that the development sample is representative of future
expected applicants. Identifying distributional change can significantly impact
the maintenance of tools capable of accurate lending decisions.

While there are no explicit resources that we found on the rationale of using
PSI, we conjecture that PSI usage stems from multiple factors as listed below:

 1. Regulations such as Basel Accords and the International Financial Reporting
    Standards (IFRS 9) discuss assessing the risk of loans with three
    components: the probability of default (PD), exposure at default (EAD), and
    loss given default (LGD). Since PSI measures shifts in probability
    distributions, its usage for measuring shifts in PD seems likely due to such
    regulations.
 2. PSI uses binning of variables, including numerical variables, which implies
    categorizing variables into bins. Despite being a numerical quantity, credit
    scores are typically categorized into bins in the financial sector. Such a
    practice also points towards the ease of usage of PSI within the industry.
 3. The PSI metric may have been widely adopted due to its inclusion in popular
    software such as SAS® Enterprise Miner™.

With the ongoing adoption of machine learning models and systems in financial
services, PSI has gained popularity as a model monitoring metric — we only
expect this trend to continue as model portfolios grow and the MLOps lifecycle
becomes standardized within organizations.


UNPACKING PSI FORMULA AS A FUNCTION OF KL DIVERGENCE

The Kullback-Leibler divergence or relative entropy is a statistical distance
measure that describes how one probability distribution is different from
another.

Given two discrete probability distributions A (actual), and E (expected)
defined on the same probability space, KL divergence is defined as:



An interpretation of KL divergence is that it measures the expected excess
surprise in using the actual distribution versus the expected distribution as a
divergence of the actual from the expected. This sounds a lot like the reasoning
behind using PSI! While KL divergence is well studied in mathematical statistics
[4] and has a lot of references to academic work [1,2], PSI is domain-specific
and lacks concrete literature on the history of its usage within financial
services. In the following, we illustrate how PSI can actually be viewed as a
special form of KL divergence. 

Consider the PSI formula and let us look at the proportion of counts within a
bin b for the actual distribution ActualProp(b) as the frequentist probability
PA(b) of the variable appearing in that bin. The same applies to the expected
distribution.

Then, we can rewrite the PSI formula as:



On expanding further,



Thus, PSI can be rewritten as:



which is the symmetrized KL divergence!

We hope you enjoyed this overview of PSI. Don’t forget to check out our blog on
detecting intersectional unfairness in AI!

———

References

 1. Siddiqi, N. (2017). Intelligent credit scoring: Building and implementing
    better credit risk scorecards. John Wiley & Sons.
 2. Yurdakul, B. (2018). Statistical properties of population stability index.
    Western Michigan University.
 3. Lin, A. Z. (2017). Examining Distributional Shifts by Using Population
    Stability Index (PSI) for Model Validation and Diagnosis. SAS Conference
    Proceedings: Western Users of SAS Software 2017 September 20-22, 2017, Long
    Beach, California URL
    https://www.lexjansen.com/wuss/2017/47_Final_Paper_PDF.pdf‍
 4. Kullback, S., & Leibler, R. A. (1951). On Information and Sufficiency. The
    Annals of Mathematical Statistics, 22(1), 79–86.
    http://www.jstor.org/stable/2236703


SUBSCRIBE TO OUR NEWSLETTER

*











Subscribe

Product
Product OverviewModel MonitoringExplainable AIAnalyticsFairnessMLOpsBuild vs
BuySecurity
Company
About UsCareers We're hiring!NewsroomPartnersAmazon SageMaker + Fiddler
Use cases
Churn DetectionFinancial ServicesFraudGovernanceUnderwriting
Resources
MediaBlogDocsPodcastsReportsResearchVideosWhat is Model Monitoring?
Connect
EventsSubscribe to NewsletterContact SalesSupport Center

© 2022 Fiddler AI. All rights reserved.
Privacy PolicyTerms of Use