Python for Financial Machine Learning at Union Investment
Introduction
Union Investment is one of Germany’s largest asset managers, managing a total of over US$ 350 billion for its customers in Germany and other European countries. As an active fundamental asset manager, we are always working on further improving our investment process. Employing novel data sources and information for our investment process in a world where more and more data is created and made available each day is key contributor to success in our business.
Machine Learning with Python
A promising way to integrate novel data in asset management is machine learning (ML), which allows to uncover patterns found within financial time series data and leverage these patterns for making even better investment decisions. Machine learning allows us to:
- Identify possible return drivers, either on the level of individual stocks or for a whole asset class such as the stock or the bond market,
- Predict key performance indicators, such as revenue, on a company level and
- Determine whether we can benefit from a specific data set and derive a value from it for our investment process.
In order to create machine learning models for these tasks, we have developed our own machine learning platform MALINA – MAchine Learning for INvestment Applications – which is a tailor-made solution to create interpretable machine learning models for financial time series data. MALINA is developed solely using Python and consists of more than 30k lines of Python code.
Within our MALINA framework, we created four decoupled modules:
- a machine learning module which allows us to define and benchmark models for financial time series data using different ML algorithms,
- a back testing module, which allows us to run back tests for trading strategies based on the developed ML models,
- a model interpretation module, which integrates our own interpretation methods for some of the machine learning algorithms available in the ML module and
- a web-based user interface that allows the user to define and benchmark models without the necessity of actually writing code.
Especially module 3, the development of novel approaches for interpreting our machine learning models, is key to us. This module allows us to open the black box of machine learning and to understand our models and their predictions, which further helps us to uncover the patterns these models have learnt.
Python’s Outstanding Ecosystem
In addition to the Python language itself, using Python allows us to rely heavily on proven open source libraries from the Python ecosystem such as
- Pandas, which provides a robust and powerful framework for managing and analyzing data,
- Scikit-Learn, which is the "go-to" package for machine learning in Python and by many considered to be the industry standard for machine learning at all,
- statsmodels and XGBoost, which extend the feature set of scikit-learn to provide advanced statistical models and gradient boosting,
- Django, an excellent and comprehensive framework to develop web applications which we built upon to create the web-based user interface and
- Sphinx, which is an excellent package to create documentation from Python DocStrings and allows us to automatically keep our documentation up with the development of our code.
All these packages have the advantage of being well-known and widely established, which means that extensive amounts of documentation and discussion can be found online for each of them. In addition, each of these packages being available as open source software means that we can easily dive into the existing code. For, e.g., the machine learning algorithms provided by Scikit-Learn that means that we are able to develop our own extensions, such as custom-tailored interpretation methods – something that would not be possible when using a proprietary machine learning framework.
In addition, many of the smaller packages available within Python ecosystem features have been very useful and saved us a lot of time when developing MALINA. One particular example here might be joblib, which makes it extremely easy to parallelize computations in a platform-agnostic manner without hassling about the details of the underlying OS.
Finally, the cross-platform capabilities mean that we can easily port MALINA to a different platform. So, while for development we can stick to the Windows machines that are commonly used in our company, we can readily switch to a Linux system for using our MALINA framework in a production environment.
Conclusion
Machine learning offers exciting new possibilities in analyzing and predicting financial time series. Due to the nature of financial markets, out-of-the-box machine learning tools do not always work as intended. More customized and adapted approaches offer greater insights.
Therefore, Union Investment developed the proprietary machine learning tool MALINA using Python. This tool allows us to develop machine learning models for financial applications and use proprietary interpretation methods to better understand these models.
Union Investment finds the application of Python and its broad variety of libraries very well suited to develop customized machine learning tools which tackle the complex challenges posed by financial time series.
About the Authors
Dr. Christian Mandery currently works as a Data Scientist / Portfolio Manager at Union Investment. As a member of the Quant & Smart data team within the portfolio management, he is primarily responsible for the application of machine learning models in investment decision making and the interpretation of such models. Christian holds a Diploma degree in computer science and an Engineering Ph.D. from the Karlsruhe Institute of Technology (KIT), Germany.
Nikolas Gerlich currently works as a Senior Data Scientist at Union Investment. He is responsible for building the data science capabilities within the portfolio management division. This includes integrating new data sources and employing novel quantitative methods, such as machine learning, to enhance investment decision making. Nikolas studied economics and statistics at the University of Tübingen and the University of Oxford.