towardsdatascience.com Open in urlscan Pro
162.159.153.4  Public Scan

Submitted URL: https://towardsdatascience.com/using-bidirectional-generative-adversarial-networks-to-estimate-value-at-risk-for-market-risk-c3...
Effective URL: https://towardsdatascience.com/using-bidirectional-generative-adversarial-networks-to-estimate-value-at-risk-for-market-risk-c3...
Submission: On December 09 via api from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

Get started
Open in app

Sign in

Get started
Follow
604K Followers
·
Editors' PicksFeaturesDeep DivesGrowContribute

About

Get started
Open in app


RESPONSES (4)



What are your thoughts?

Cancel
Respond

Also publish to my profile

There are currently no responses for this story.

Be the first to respond.


USING BIDIRECTIONAL GENERATIVE ADVERSARIAL NETWORKS TO ESTIMATE VALUE-AT-RISK
FOR MARKET RISK MANAGEMENT

Hamaad Shah

Aug 18, 2018·18 min read



We will explore the use of Bidirectional Generative Adversarial Networks (BiGAN)
for market risk management: Estimation of portfolio risk measures such as
Value-at-Risk (VaR). Generative Adversarial Networks (GAN) allow us to
implicitly maximize the likelihood of complex distributions thereby allowing us
to generate samples from such distributions — the key point here is the implicit
maximum likelihood estimation principle whereby we do not specify what this
complex distribution is parameterized as. Dealing with high dimensional data
potentially coming from a complex distribution is a key aspect to market risk
management among many other financial services use cases. GAN, specifically
BiGAN for the purpose of this paper, will allow us to deal with potentially
complex financial services data such that we do not have to explicitly specify a
distribution such as a multidimensional Gaussian distribution.




MARKET RISK MANAGEMENT: VALUE-AT-RISK (VAR)

VaR is a measure of portfolio risk. For instance, a 1% VaR of -5% means that
there is a 1% chance of earning a portfolio return of less than -5%. Think of it
as a (lower) percentile or quantile of a portfolio returns distribution, i.e.,
we are concerned about the tail risk — the small chance of losing a remarkably
large portfolio value. Such a large loss is funded by our own funds, i.e.,
capital which is an expensive source of funding compared to other peoples’
funds, i.e., debt. Therefore the estimation of VaR and similar market risk
management measures inform banks and insurance firms with regards to the levels
of capital they need to hold in order to have a buffer against unexpected
downturns — market risk.

For our purpose, let us begin by fetching a data set of 5 stocks from Yahoo. The
stocks are Apple, Google, Microsoft, Intel and Box. We use a daily frequency for
our data for the year 2016. We use the stock’s daily closing prices to compute
the continuously compounded returns:



Let’s assume that we have an equal weight for each of the 5 assets in our
portfolio. Based on this portfolio weights assumption we can calculate the
portfolio returns.



Let’s estimate the expected returns vector, volatilities vector, correlation and
variance-covariance matrices. The variance-covariance matrix is recovered from
the estimated volatilities vector and correlation matrix:



where



is the Hadamard product.



Portfolio volatility is estimated as:



We consider the 3 major methods used in market risk management, specifically for
the estimation of VaR. Please note that there are multiple different methods for
estimating VaR and other more coherent risk measures such as Conditional
Value-at-Risk (CVaR) however we are only considering the few major ones.


VAR: VARIANCE-COVARIANCE METHOD

The first one is the variance-covariance method and uses the estimated portfolio
volatility



under the Gaussian assumption to estimate VaR. Let’s assume we are attempting to
estimate 1% VaR: This means that there is a 1% probability of obtaining a
portfolio return of less than the VaR value. Using the variance-covariance
approach the calculation is:




VAR: HISTORICAL SIMULATION METHOD

The second method is a non-parametric approach where we sample with replacement
from the historical data to estimate a portfolio returns distribution. The 1%
VaR is simply the appropriate quantile from this sampled portfolio returns
distribution.


VAR: MONTE CARLO METHOD

The third method is Monte Carlo sampling from a multidimensional Gaussian
distribution using the aforementioned



Finally the 1% VaR is simply the appropriate quantile from this sampled
portfolio returns distribution.


VAR: ESTIMATES

For our stock returns data, the 1% VaR estimates from the aforementioned 3
market risk management methods commonly used in banking are as follows:

> Variance-covariance: -2.87%
> Historical simulation: -3.65%
> Monte Carlo simulation: -2.63%

Now we move towards using Bidirectional Generative Adversarial Networks (BiGAN)
for VaR market risk management.


BIDIRECTIONAL GENERATIVE ADVERSARIAL NETWORK (BIGAN)

The 2 main components to a Generative Adversarial Network (GAN) are the
generator and the discriminator. These 2 components play an adversarial game
against each other. In doing so the generator learns how to create realistic
synthetic samples from noise, i.e., the latent space z, while the discriminator
learns how to distinguish between a real sample and a synthetic sample. See the
following article of mine for a detailed explanation of GAN:
https://goo.gl/ZWYngw

BiGAN extends GAN by adding a third component: The encoder, which learns to map
from data space x to the latent space z. The objective of the generator remains
the same while the objective of the discriminator is altered to classify between
a real sample and a synthetic sample and additionally between a real encoding,
i.e., given by the encoder, and a synthetic encoding, i.e., a sample from the
latent space z.


GENERATOR

Assume that we have a prior belief on where the latent space z lies:



Given a draw from this latent space the generator G, a deep learner, outputs a
synthetic sample.




ENCODER

This can be shown to be an inverse of the generator. Given a draw from the data
space the encoder E, a deep learner, outputs a real encoding.




DISCRIMINATOR

The discriminator D, a deep learner, aims to classify if a sample is real or
synthetic, i.e., if a sample is from the real data distribution,



or the synthetic data distribution.



Additionally it aims to classify whether an encoding is real,



or synthetic.



Let us denote the discriminator D as follows.



We assume that the positive examples are real, i.e.,



while the negative examples are synthetic, i.e.,




OPTIMAL DISCRIMINATOR, ENCODER AND GENERATOR

The BiGAN has the following objective function, similar to the GAN.




Let us take a closer look at the discriminator’s objective function.



We have found the optimal discriminator given a generator and an encoder. Let us
focus now on the generator and encoder’s objective function which is essentially
to minimize the discriminator’s objective function.



We will note the Kullback–Leibler (KL) divergences in the above objective
function for the generator and encoder.



Recall the definition of a lambda divergence.



If lambda takes the value of 0.5 this is then called the Jensen-Shannon (JS)
divergence. This divergence is symmetric and non-negative.



Keeping this in mind let us take a look again at the objective function of the
generator and the encoder.



It is clear from the objective function of the generator and encoder above that
the global minimum value attained is -log(4) which occurs when the following
holds.



When the above holds the Jensen-Shannon divergence, i.e.,



will be zero. Hence we have shown that the optimal solution is as follows.



Given the above result we can prove that the optimal discriminator will be 0.5.




OPTIMAL ENCODER AND GENERATOR ARE INVERSE FUNCTIONS OF EACH OTHER

At the optimal generator and encoder we can show that the generator and encoder
are inverse functions of each other. Recall from earlier the definitions of the
generator and the encoder.




At this point the optimal discriminator is 0.5, i.e., the discriminator cannot
effectively differentiate between real and synthetic data as the synthetic data
is realistic. Remember that at this point the likelihood would have been
implicitly maximized such that any samples taken from the synthetic distribution
should be similar to those taken from the real distribution. In short, if
optimality of the generator, encoder and discriminator holds then the synthetic
data should look similar, or rather be the same, as the real data. Keeping this
important point in mind let’s slightly re-write the optimal generator and
encoder functions.




Recall further that the following holds at the optimal generator and encoder.



In the above please note the following; note also that we make the assumption
that the generator is not an inverse function of the encoder for providing a
proof by contradiction.



Recall that optimality condition of the generator and encoder.



In the above please note the following.



If optimality holds then the following holds as shown above.



However since we assumed that the generator is not an inverse function of the
encoder then the above conditions cannot hold thereby violating the optimality
condition.



Therefore we have shown by contradiction that under optimality of the generator
and encoder the generator is an inverse function of the encoder.



The same arguments made above can be shown for the encoder being the inverse of
the generator.



In the above please note the following; note also that we make the assumption
that the encoder is not an inverse function of the generator for providing a
proof by contradiction.



Recall that optimality condition of the generator and encoder.



In the above please note the following.



If optimality holds then the following holds as shown above.



However since we assumed that the encoder is not an inverse function of the
generator then the above conditions cannot hold thereby violating the optimality
condition.



Therefore we have shown by contradiction that under optimality of the generator
and encoder the encoder is an inverse function of the generator.



Therefore we have shown that the optimal encoder and generator are inverse
functions of each other via proof by contradiction: If they were not inverse
functions of each other then it would violate the optimality condition for the
encoder and generator, i.e.,




BIGAN RELATION TO AUTOENCODERS

At this point it might be a good idea to review my previous article on
autoencoders here: https://goo.gl/qWqbbv

Note that given an optimal discriminator, the objective function of the
generator and encoder can be thought of as that of an autoencoder, where the
generator plays the role of a decoder. The objective function of the generator
and encoder is simply to minimize the objective function of the discriminator,
i.e., we have not explicitly specified the structure of the reconstruction loss
as one might do so with an autoencoder. This implicit minimization of the
reconstruction loss is yet another great advantage of BiGAN: One does not need
to explicitly define a reconstruction loss.

Let’s remind ourselves of the objective function of the generator and encoder.



Let’s deal with



first and then with



second in a similar manner. These are defined as follows.



Briefly recall the definition of a Radon-Nikodym derivative:




It follows then.



Now for the second term:




It follows then.



Note also that.



Now we shall prove that



To prove this, assume that




Therefore,



with



almost everywhere. This means that



is well defined.

Now we shall prove that



To prove this, assume that




Therefore,



with



almost everywhere. This means that



is well defined.

The



outside the support of



is 0. Similarly, the



outside the support of



is 0. We start with the former.



Note that because of



this implies




Assuming this holds we clearly show that



which is a contradiction.

Hence,



and



almost everywhere in



This implies that



almost everywhere in



Hence



in the support



is 0, i.e.,



outside the support of



is 0.

The



outside the support of



is 0.



Note that because of



this implies




Assuming this holds we clearly show that



which is a contradiction.

Hence,



and



almost everywhere in



This implies that



almost everywhere in



Hence



in the support



is 0, i.e.,



outside the support of



is 0.

Therefore the aforementioned KL divergences are most likely non-zero in



In this space we show that



Lets assume that in the region



and that this set



This implies that



and



Therefore the following holds.



Clearly the above implies that



and contradicts the earlier definition of support, i.e.,



Hence



is an empty set and hence



Note finally that this implies that



Lets assume that in the region



and that this set



This implies that



and



Therefore the following holds.



Clearly the above implies that



and contradicts the earlier definition of support, i.e.,



Hence



is an empty set and hence



Note finally that this implies that



We have clearly shown that



and



This implies that



is the only region where



and



might be non-zero.

Therefore.




We have clearly shown that



Therefore,



We have clearly shown that



Therefore,



In the extensive proof above we have shown the relationship of the BiGAN with
autoencoders. The BiGAN can be a useful for representation learning or automatic
feature engineering.

At this point we have defined the BiGAN and now we shall implement it and use it
for our particular use case as mentioned earlier.


IMPLEMENTATION

My work done here is maintained on the following git repo with Python and R
code: https://github.com/hamaadshah/market_risk_gan_keras


CONCLUSION

Before we conclude the article let’s have a look at the portfolio returns
distribution sampled from an untrained BiGAN.



It is clear from the above graph that the untrained BiGAN’s sampled portfolio
returns distribution is remarkably different from the actual portfolio returns
distribution. This is, as we can imagine, to be expected.

Contrast this with a trained BiGAN: The following graph will clearly show the
value of GAN type models for market risk management as we have arrived at this
learnt portfolio returns distribution without having to rely on a possibly
incorrect assumption with regards to the actual portfolio returns distribution
such as a multidimensional Gaussian distribution.

Note that we perhaps should use an evolutionary algorithm or a reinforcement
learner to automatically learn the appropriate GAN or BiGAN architecture:
Perhaps that shall be a topic for a future article.



Finally we update the VaR estimate table for using different market risk
management methods as below. We can see that the VaR estimate provided by the
BiGAN is similar, if not exactly the same, to the ones provided by the other
market risk management methods. This provides us with a good sanity check with
regards to using the BiGAN for market risk management in that it provides
competitive results with respect to well established existing market risk
management methods.

> Variance-covariance: -2.87%
> Historical simulation: -3.65%
> Monte Carlo simulation: -2.63%
> 
> Bidirectional Generative Adversarial Network: -4.42%

The portfolio of 5 stocks we had to work with was not particularly complicated
compared to potentially having portfolios where we might have derivatives or
other portfolio components. Arriving at the correct portfolio returns
distribution of a potentially complicated portfolio is a problem that has been
shown can be solved via deep learning specifically the BiGAN. This result can be
useful for market risk management and any other different problem space where we
need to generate samples from a potentially complex, and perhaps unknown,
distribution.

There will potentially be a follow up article of mine where we look at a
complicated backtesting scenario, i.e., validating that market risk management
VaR type estimates provided by BiGAN is appropriate for future portfolio returns
distributions that we have not seen, and perhaps using more complicated
portfolios.

The aim of this article of mine was to clearly show that a trained BiGAN can be
used for market risk management VaR estimation for a given portfolio.


REFERENCES

1. Goodfellow, I., Bengio, Y. and Courville A. (2016). Deep Learning (MIT
Press).
2. Geron, A. (2017). Hands-On Machine Learning with Scikit-Learn & Tensorflow
(O’Reilly).
3. Kingma, D. P., and Welling M. (2014). Auto-Encoding Variational Bayes
(https://arxiv.org/abs/1312.6114).
4. http://scikit-learn.org/stable/#
5.
https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1
6.
https://stackoverflow.com/questions/42177658/how-to-switch-backend-with-keras-from-tensorflow-to-theano
7. https://blog.keras.io/building-autoencoders-in-keras.html
8. https://keras.io
9. Chollet, F. (2018). Deep Learning with Python (Manning).
10. Hull, John C. (2010). Risk Management and Financial Institutions (Pearson).
11.
https://towardsdatascience.com/automatic-feature-engineering-using-deep-learning-and-bayesian-inference-application-to-computer-7b2bb8dc7351
12.
https://towardsdatascience.com/automatic-feature-engineering-using-generative-adversarial-networks-8e24b3c16bf3
13. Donahue, J., Krähenbühl, P. and Darrell, T. (2017). Adversarial Feature
Learning (https://arxiv.org/pdf/1605.09782).
14. https://github.com/eriklindernoren/Keras-GAN


HAMAAD SHAH

VP data science at Columbia Threadneedle Investments and guest speaker at the
University of Oxford DCE.

Follow

Hamaad Shah Follows

 * SYNCED

 * JESUS RODRIGUEZ

 * TDS EDITORS

 * CASSIE KOZYRKOV

 * MARK HUMPHRIES

See all (77)

453

4





SIGN UP FOR THE VARIABLE


BY TOWARDS DATA SCIENCE

Every Thursday, the Variable delivers the very best of Towards Data Science:
from hands-on tutorials and cutting-edge research to original features you don't
want to miss. Take a look.

Get this newsletter
 * Machine Learning
 * Towards Data Science
 * Risk
 * Data Science
 * Generative Adversarial

453 claps

453

4




MORE FROM TOWARDS DATA SCIENCE

Follow

Your home for data science. A Medium publication sharing concepts, ideas and
codes.

Read more from Towards Data Science


MORE FROM MEDIUM


CONSUME DATA WITH POWER BI AND HOW TO BUILD A SIMPLE DASHBOARD

Anujkumar mistry



NONCONFORMIST: AN EASY WAY TO ESTIMATE PREDICTION INTERVALS

Maria Jesus Ugarte in spikelab



WHAT ARE DEVELOPERS INTO?

Vivank Sharma



INTRODUCING BRAINANCE

Advantis Medical Imaging in Advantis Blog



TOP 4 RANDOM SAMPLING TECHNIQUES

Deepak Yadav



PROSTITUTION IN THE US: THE DATA

Fillmore



A GUIDE TO PANEL DATA REGRESSION: THEORETICS AND IMPLEMENTATION WITH PYTHON.

Bernhard Brugger in Towards Data Science



HOMEWORK #5: DEATH, AGE, AND CONDOMS

Emmet White


About

Write

Help

Legal

Get the Medium app


To make Medium work, we log user data. By using Medium, you agree to our Privacy
Policy, including cookie policy.