Generating realistic data is a challenge that is often encountered in model development, testing and validation. There are many relevant examples. In valuation modelling, we need market data such as interest rate curves and volatility surfaces. These objects have an intricate structure and strong constraints from non-arbitrage conditions. Hence, generating them randomly in a naive fashion is bound to fail. Random curves and surfaces could indeed expose issues with a model but since such configurations are extremely unlikely to realize, the added value of a test like this is rather low.
Another scenario where generation is needed, is the case of sparse data. When e.g. studying low default portfolio's, by definition there is not a lot of data to train the models on. Hence, proper data generation techniques are extremely important.
One traditional approach, especially in the context of time series, is to perform a principal component analysis on a set of data samples (such as IR curves). Once the principal components are determined, we can then construct the distribution of the strength of every component. Generating a new sample then simply means that we sample from these distributions and reconstruct the data using the components. Such an approach works fine for data sets that are not too large. However, for big datasets, or in case the data does not have a time dimension, we need other techniques.
Generative Adversarial Networks (see Goodfellow I. et al) are a very powerful technique to build extremely realistic data sets. The algorithm consists of two neural networks. A first network, the generator, creates candidates while the discriminator attempts to identify wether the candidate originated from the real dataset or if the generator created a synthetic sample. By repeating this procedure, the generator becomes more and more accurate in creating realistically looking candidates while the discriminator becomes better at identifying deviations from the real dataset. Once the system is trained, one can use the generator to create very realistic samples.
Apart from generating realistic datasets, GAN's have many more applications in finance. To end with one interesting use case, in this paper Hadad et al. use GAN's to decompose stock price time series in a market and an idiosyncratic component.