I would like to thank Ashish and Cyril for their help and guidance on the library. Thanks to Matej for the tweaks that optimised the Beam Python runs on Dataflow. We look forward to your continued guidance on this journey.
Unless you have been living in a cave, its unlikely that you would have missed the hype around ML and AI, of late. If you have dabbled in it then you would have come across Tensorflow, an open source framework used in machine learning.
You are probably less aware of a quantitative finance library (TF Quant Finance) under development, based on Tensorflow. The promise of untold riches lies in the eye of the beholder. I would like to share some of those riches here.
TF Quant Finance brings the promise of unprecedented scale without the need for special secret sauces. Traditional C++ based quant finance libraries have had to adopt CUDA as the language to exploit the speed of GPU technologies. Exploiting parallelism by running data pipelines through FPGA's requires the use of OpenCL and other constructs. Special sauces lead to special programmers, in turn to extra special costs of ownership over the long term. That is not to say that TF Quant Finance does not support GPU's, they do. All that messy CUDA stuff is abstracted away for you.
TF Quant Finance is python based and is built on Tensorflow. The secret ingredient here is that there is no secret ingredient. The simple example here does not really demonstrate the true extent of the scale. This example is really to get across some basic constructs of using this library. As we move on to solving PDE's, Quadrature or Brownian Bridge, in the future, it should become more obvious.
I am going to use simple Black Scholes on plain vanilla options. We will use 50 million options, a mixture of calls and puts - defined in json format. They will be priced with both TF Quant Finance and Quantlib (python). I am using Apache Beam Python and Google Dataflow to run my examples. However, there may be simpler routes to running this in the future.
The vanilla_prices.py is the pricer in TF Quant Finance.
In Quantlib, I am using the python methodology outlined here.
A typical json trade is defined as follows
A quick numbers check between TF Quant Finance and Quantlib was carried out using Jupyter notebook (executing the beam pipeline in Jupyter).
With TF Quant Finance, the inputs into 'option_price' in vanilla_prices.py are tensors (or objects that can be readily converted to tensors such as numpy arrays). So we need to take our 50 million json trades and convert them into (batches of) numpy arrays. Two methods were used.
With Quantlib, it was a simpler affair as it required no grouping or array builds.. Each trade needed to be parsed and supplied into the pricing engine.
The Dataflow jobs were run with c2-standard-60 machines (us-east1), num_workers set to 2.
SSD were used via the --worker_disk_type flag due to the number of key's and shuffles involved, sized to --disk_size_gb=4096, 4TB.
The Quantlib job took around 200 seconds (wall clock time) to compute and a total (wall clock) time of just under six minutes including startup and shutdown timings.
The TF Quant Finance jobs took around 170 seconds (wall clock time) but the timings varied based mainly around the size of the array supplied to the pricing api. Again total time was just over five minutes including startup and shutdown timings.
A comparative plot of the timings of the various runs under different scenarios is as below
tff:Group by scenarios
tff:No group by - 20m & 10m read split
The quickest compute run for Quantlib was 190 seconds
The quicket compute run for TF Quant Finance was 146 seconds
The amount of time spent in the various read/parse/make array/compute/write operations were
(please note that these are based on wall clock times as discerned from Dataflow logs)
TF Quant Finance provides a competitive alternative to generating Risk and P&L metrics. The library is still under development and complex analytics are constantly being added. The fundamental difference lies in the way large tensor based parameters are fed into the API which returns a tensor of results.
The Black Scholes formula for European option pricing does not really tax either Quantlib or TF Quant Finance to be able to demonstrate the true scale on offer. The computational graph is simple, there is no recursion or path dependencies involved. However, you can see from the compute times, the efficiency that can be achieved with TF Quant Finance is promising.
Work is being undertaken to facilitate the use of tensorflow native records in the API. This will allow I/O to occupy a smaller proportion of the total elapsed time to run a batch. The differentials should then begin to shake out, particularly in more complex algorithms. The simplicity of Python allows the code base to remain simple and readable. Though I have used Beam on Dataflow to orchestrate the data, this is complexity that may not be required in the future. Simple Kubernetes based orchestration should suffice. More on that later.
Time permitting, the next example will involve something more complex (perhaps solving a PDE). Please stay tuned.