towardsdatascience.com
Open in
urlscan Pro
162.159.153.4
Public Scan
Submitted URL: https://towardsdatascience.com/using-bidirectional-generative-adversarial-networks-to-estimate-value-at-risk-for-market-risk-c3...
Effective URL: https://towardsdatascience.com/using-bidirectional-generative-adversarial-networks-to-estimate-value-at-risk-for-market-risk-c3...
Submission: On December 09 via api from US — Scanned from DE
Effective URL: https://towardsdatascience.com/using-bidirectional-generative-adversarial-networks-to-estimate-value-at-risk-for-market-risk-c3...
Submission: On December 09 via api from US — Scanned from DE
Form analysis
0 forms found in the DOMText Content
Get started Open in app Sign in Get started Follow 604K Followers · Editors' PicksFeaturesDeep DivesGrowContribute About Get started Open in app RESPONSES (4) What are your thoughts? Cancel Respond Also publish to my profile There are currently no responses for this story. Be the first to respond. USING BIDIRECTIONAL GENERATIVE ADVERSARIAL NETWORKS TO ESTIMATE VALUE-AT-RISK FOR MARKET RISK MANAGEMENT Hamaad Shah Aug 18, 2018·18 min read We will explore the use of Bidirectional Generative Adversarial Networks (BiGAN) for market risk management: Estimation of portfolio risk measures such as Value-at-Risk (VaR). Generative Adversarial Networks (GAN) allow us to implicitly maximize the likelihood of complex distributions thereby allowing us to generate samples from such distributions — the key point here is the implicit maximum likelihood estimation principle whereby we do not specify what this complex distribution is parameterized as. Dealing with high dimensional data potentially coming from a complex distribution is a key aspect to market risk management among many other financial services use cases. GAN, specifically BiGAN for the purpose of this paper, will allow us to deal with potentially complex financial services data such that we do not have to explicitly specify a distribution such as a multidimensional Gaussian distribution. MARKET RISK MANAGEMENT: VALUE-AT-RISK (VAR) VaR is a measure of portfolio risk. For instance, a 1% VaR of -5% means that there is a 1% chance of earning a portfolio return of less than -5%. Think of it as a (lower) percentile or quantile of a portfolio returns distribution, i.e., we are concerned about the tail risk — the small chance of losing a remarkably large portfolio value. Such a large loss is funded by our own funds, i.e., capital which is an expensive source of funding compared to other peoples’ funds, i.e., debt. Therefore the estimation of VaR and similar market risk management measures inform banks and insurance firms with regards to the levels of capital they need to hold in order to have a buffer against unexpected downturns — market risk. For our purpose, let us begin by fetching a data set of 5 stocks from Yahoo. The stocks are Apple, Google, Microsoft, Intel and Box. We use a daily frequency for our data for the year 2016. We use the stock’s daily closing prices to compute the continuously compounded returns: Let’s assume that we have an equal weight for each of the 5 assets in our portfolio. Based on this portfolio weights assumption we can calculate the portfolio returns. Let’s estimate the expected returns vector, volatilities vector, correlation and variance-covariance matrices. The variance-covariance matrix is recovered from the estimated volatilities vector and correlation matrix: where is the Hadamard product. Portfolio volatility is estimated as: We consider the 3 major methods used in market risk management, specifically for the estimation of VaR. Please note that there are multiple different methods for estimating VaR and other more coherent risk measures such as Conditional Value-at-Risk (CVaR) however we are only considering the few major ones. VAR: VARIANCE-COVARIANCE METHOD The first one is the variance-covariance method and uses the estimated portfolio volatility under the Gaussian assumption to estimate VaR. Let’s assume we are attempting to estimate 1% VaR: This means that there is a 1% probability of obtaining a portfolio return of less than the VaR value. Using the variance-covariance approach the calculation is: VAR: HISTORICAL SIMULATION METHOD The second method is a non-parametric approach where we sample with replacement from the historical data to estimate a portfolio returns distribution. The 1% VaR is simply the appropriate quantile from this sampled portfolio returns distribution. VAR: MONTE CARLO METHOD The third method is Monte Carlo sampling from a multidimensional Gaussian distribution using the aforementioned Finally the 1% VaR is simply the appropriate quantile from this sampled portfolio returns distribution. VAR: ESTIMATES For our stock returns data, the 1% VaR estimates from the aforementioned 3 market risk management methods commonly used in banking are as follows: > Variance-covariance: -2.87% > Historical simulation: -3.65% > Monte Carlo simulation: -2.63% Now we move towards using Bidirectional Generative Adversarial Networks (BiGAN) for VaR market risk management. BIDIRECTIONAL GENERATIVE ADVERSARIAL NETWORK (BIGAN) The 2 main components to a Generative Adversarial Network (GAN) are the generator and the discriminator. These 2 components play an adversarial game against each other. In doing so the generator learns how to create realistic synthetic samples from noise, i.e., the latent space z, while the discriminator learns how to distinguish between a real sample and a synthetic sample. See the following article of mine for a detailed explanation of GAN: https://goo.gl/ZWYngw BiGAN extends GAN by adding a third component: The encoder, which learns to map from data space x to the latent space z. The objective of the generator remains the same while the objective of the discriminator is altered to classify between a real sample and a synthetic sample and additionally between a real encoding, i.e., given by the encoder, and a synthetic encoding, i.e., a sample from the latent space z. GENERATOR Assume that we have a prior belief on where the latent space z lies: Given a draw from this latent space the generator G, a deep learner, outputs a synthetic sample. ENCODER This can be shown to be an inverse of the generator. Given a draw from the data space the encoder E, a deep learner, outputs a real encoding. DISCRIMINATOR The discriminator D, a deep learner, aims to classify if a sample is real or synthetic, i.e., if a sample is from the real data distribution, or the synthetic data distribution. Additionally it aims to classify whether an encoding is real, or synthetic. Let us denote the discriminator D as follows. We assume that the positive examples are real, i.e., while the negative examples are synthetic, i.e., OPTIMAL DISCRIMINATOR, ENCODER AND GENERATOR The BiGAN has the following objective function, similar to the GAN. Let us take a closer look at the discriminator’s objective function. We have found the optimal discriminator given a generator and an encoder. Let us focus now on the generator and encoder’s objective function which is essentially to minimize the discriminator’s objective function. We will note the Kullback–Leibler (KL) divergences in the above objective function for the generator and encoder. Recall the definition of a lambda divergence. If lambda takes the value of 0.5 this is then called the Jensen-Shannon (JS) divergence. This divergence is symmetric and non-negative. Keeping this in mind let us take a look again at the objective function of the generator and the encoder. It is clear from the objective function of the generator and encoder above that the global minimum value attained is -log(4) which occurs when the following holds. When the above holds the Jensen-Shannon divergence, i.e., will be zero. Hence we have shown that the optimal solution is as follows. Given the above result we can prove that the optimal discriminator will be 0.5. OPTIMAL ENCODER AND GENERATOR ARE INVERSE FUNCTIONS OF EACH OTHER At the optimal generator and encoder we can show that the generator and encoder are inverse functions of each other. Recall from earlier the definitions of the generator and the encoder. At this point the optimal discriminator is 0.5, i.e., the discriminator cannot effectively differentiate between real and synthetic data as the synthetic data is realistic. Remember that at this point the likelihood would have been implicitly maximized such that any samples taken from the synthetic distribution should be similar to those taken from the real distribution. In short, if optimality of the generator, encoder and discriminator holds then the synthetic data should look similar, or rather be the same, as the real data. Keeping this important point in mind let’s slightly re-write the optimal generator and encoder functions. Recall further that the following holds at the optimal generator and encoder. In the above please note the following; note also that we make the assumption that the generator is not an inverse function of the encoder for providing a proof by contradiction. Recall that optimality condition of the generator and encoder. In the above please note the following. If optimality holds then the following holds as shown above. However since we assumed that the generator is not an inverse function of the encoder then the above conditions cannot hold thereby violating the optimality condition. Therefore we have shown by contradiction that under optimality of the generator and encoder the generator is an inverse function of the encoder. The same arguments made above can be shown for the encoder being the inverse of the generator. In the above please note the following; note also that we make the assumption that the encoder is not an inverse function of the generator for providing a proof by contradiction. Recall that optimality condition of the generator and encoder. In the above please note the following. If optimality holds then the following holds as shown above. However since we assumed that the encoder is not an inverse function of the generator then the above conditions cannot hold thereby violating the optimality condition. Therefore we have shown by contradiction that under optimality of the generator and encoder the encoder is an inverse function of the generator. Therefore we have shown that the optimal encoder and generator are inverse functions of each other via proof by contradiction: If they were not inverse functions of each other then it would violate the optimality condition for the encoder and generator, i.e., BIGAN RELATION TO AUTOENCODERS At this point it might be a good idea to review my previous article on autoencoders here: https://goo.gl/qWqbbv Note that given an optimal discriminator, the objective function of the generator and encoder can be thought of as that of an autoencoder, where the generator plays the role of a decoder. The objective function of the generator and encoder is simply to minimize the objective function of the discriminator, i.e., we have not explicitly specified the structure of the reconstruction loss as one might do so with an autoencoder. This implicit minimization of the reconstruction loss is yet another great advantage of BiGAN: One does not need to explicitly define a reconstruction loss. Let’s remind ourselves of the objective function of the generator and encoder. Let’s deal with first and then with second in a similar manner. These are defined as follows. Briefly recall the definition of a Radon-Nikodym derivative: It follows then. Now for the second term: It follows then. Note also that. Now we shall prove that To prove this, assume that Therefore, with almost everywhere. This means that is well defined. Now we shall prove that To prove this, assume that Therefore, with almost everywhere. This means that is well defined. The outside the support of is 0. Similarly, the outside the support of is 0. We start with the former. Note that because of this implies Assuming this holds we clearly show that which is a contradiction. Hence, and almost everywhere in This implies that almost everywhere in Hence in the support is 0, i.e., outside the support of is 0. The outside the support of is 0. Note that because of this implies Assuming this holds we clearly show that which is a contradiction. Hence, and almost everywhere in This implies that almost everywhere in Hence in the support is 0, i.e., outside the support of is 0. Therefore the aforementioned KL divergences are most likely non-zero in In this space we show that Lets assume that in the region and that this set This implies that and Therefore the following holds. Clearly the above implies that and contradicts the earlier definition of support, i.e., Hence is an empty set and hence Note finally that this implies that Lets assume that in the region and that this set This implies that and Therefore the following holds. Clearly the above implies that and contradicts the earlier definition of support, i.e., Hence is an empty set and hence Note finally that this implies that We have clearly shown that and This implies that is the only region where and might be non-zero. Therefore. We have clearly shown that Therefore, We have clearly shown that Therefore, In the extensive proof above we have shown the relationship of the BiGAN with autoencoders. The BiGAN can be a useful for representation learning or automatic feature engineering. At this point we have defined the BiGAN and now we shall implement it and use it for our particular use case as mentioned earlier. IMPLEMENTATION My work done here is maintained on the following git repo with Python and R code: https://github.com/hamaadshah/market_risk_gan_keras CONCLUSION Before we conclude the article let’s have a look at the portfolio returns distribution sampled from an untrained BiGAN. It is clear from the above graph that the untrained BiGAN’s sampled portfolio returns distribution is remarkably different from the actual portfolio returns distribution. This is, as we can imagine, to be expected. Contrast this with a trained BiGAN: The following graph will clearly show the value of GAN type models for market risk management as we have arrived at this learnt portfolio returns distribution without having to rely on a possibly incorrect assumption with regards to the actual portfolio returns distribution such as a multidimensional Gaussian distribution. Note that we perhaps should use an evolutionary algorithm or a reinforcement learner to automatically learn the appropriate GAN or BiGAN architecture: Perhaps that shall be a topic for a future article. Finally we update the VaR estimate table for using different market risk management methods as below. We can see that the VaR estimate provided by the BiGAN is similar, if not exactly the same, to the ones provided by the other market risk management methods. This provides us with a good sanity check with regards to using the BiGAN for market risk management in that it provides competitive results with respect to well established existing market risk management methods. > Variance-covariance: -2.87% > Historical simulation: -3.65% > Monte Carlo simulation: -2.63% > > Bidirectional Generative Adversarial Network: -4.42% The portfolio of 5 stocks we had to work with was not particularly complicated compared to potentially having portfolios where we might have derivatives or other portfolio components. Arriving at the correct portfolio returns distribution of a potentially complicated portfolio is a problem that has been shown can be solved via deep learning specifically the BiGAN. This result can be useful for market risk management and any other different problem space where we need to generate samples from a potentially complex, and perhaps unknown, distribution. There will potentially be a follow up article of mine where we look at a complicated backtesting scenario, i.e., validating that market risk management VaR type estimates provided by BiGAN is appropriate for future portfolio returns distributions that we have not seen, and perhaps using more complicated portfolios. The aim of this article of mine was to clearly show that a trained BiGAN can be used for market risk management VaR estimation for a given portfolio. REFERENCES 1. Goodfellow, I., Bengio, Y. and Courville A. (2016). Deep Learning (MIT Press). 2. Geron, A. (2017). Hands-On Machine Learning with Scikit-Learn & Tensorflow (O’Reilly). 3. Kingma, D. P., and Welling M. (2014). Auto-Encoding Variational Bayes (https://arxiv.org/abs/1312.6114). 4. http://scikit-learn.org/stable/# 5. https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1 6. https://stackoverflow.com/questions/42177658/how-to-switch-backend-with-keras-from-tensorflow-to-theano 7. https://blog.keras.io/building-autoencoders-in-keras.html 8. https://keras.io 9. Chollet, F. (2018). Deep Learning with Python (Manning). 10. Hull, John C. (2010). Risk Management and Financial Institutions (Pearson). 11. https://towardsdatascience.com/automatic-feature-engineering-using-deep-learning-and-bayesian-inference-application-to-computer-7b2bb8dc7351 12. https://towardsdatascience.com/automatic-feature-engineering-using-generative-adversarial-networks-8e24b3c16bf3 13. Donahue, J., Krähenbühl, P. and Darrell, T. (2017). Adversarial Feature Learning (https://arxiv.org/pdf/1605.09782). 14. https://github.com/eriklindernoren/Keras-GAN HAMAAD SHAH VP data science at Columbia Threadneedle Investments and guest speaker at the University of Oxford DCE. Follow Hamaad Shah Follows * SYNCED * JESUS RODRIGUEZ * TDS EDITORS * CASSIE KOZYRKOV * MARK HUMPHRIES See all (77) 453 4 SIGN UP FOR THE VARIABLE BY TOWARDS DATA SCIENCE Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge research to original features you don't want to miss. Take a look. Get this newsletter * Machine Learning * Towards Data Science * Risk * Data Science * Generative Adversarial 453 claps 453 4 MORE FROM TOWARDS DATA SCIENCE Follow Your home for data science. A Medium publication sharing concepts, ideas and codes. Read more from Towards Data Science MORE FROM MEDIUM CONSUME DATA WITH POWER BI AND HOW TO BUILD A SIMPLE DASHBOARD Anujkumar mistry NONCONFORMIST: AN EASY WAY TO ESTIMATE PREDICTION INTERVALS Maria Jesus Ugarte in spikelab WHAT ARE DEVELOPERS INTO? Vivank Sharma INTRODUCING BRAINANCE Advantis Medical Imaging in Advantis Blog TOP 4 RANDOM SAMPLING TECHNIQUES Deepak Yadav PROSTITUTION IN THE US: THE DATA Fillmore A GUIDE TO PANEL DATA REGRESSION: THEORETICS AND IMPLEMENTATION WITH PYTHON. Bernhard Brugger in Towards Data Science HOMEWORK #5: DEATH, AGE, AND CONDOMS Emmet White About Write Help Legal Get the Medium app To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including cookie policy.