How should long-term investors form portfolios? How should they evaluate securities, portfolios, and managers? How should they adapt to time-varying expected returns, volatilities, correlations; and many factors, signals, and strategies?
Traditional asset management techniques provide very little guidance to these questions. Academic studies are often rooted in 50 or 70-year-old portfolio theories, as are standard investment textbooks. Yet these theories have very little to do with real-life portfolio practice.
Nowadays, the availability of alternative data and their adoption in the asset management industry to improve investment decision processes is a widespread alternative approach. It comes with the help of Machine Learning, ML, and Artificial Intelligence, AI, techniques that allow the processing of these data, reducing their noise, and converging Big Data into Smart Data. According to the recent 2021 Refinitiv survey about the rise of data scientists in the finance industry, 72% of respondents state that AI/ML is a core component of their business strategy and 80% state that they are making significant investments in AI/ML technologies & techniques. 32% of respondents use AI/ML in portfolio management, and 59% in any investment. Another striking result of this survey is that 75% of firms use deep learning, considering that deep learning has been previously seen as a more academic niche.
The practical motivation for the long investment horizon perspective in portfolio management comes to light due to climate finance and climate risk investing, which also have long-term objectives. Yet long-term investors, like pension funds or educational endowments, underperformed passive investment by approximately 1% or 1.6% a year, respectively, for the ten years ending June 30, 2018, and this underperformance is expected to persist in the years ahead. This is even before incorporating long-term climate risks into portfolio construction and asset allocation decisions, which is an additional constraint.
In our new article, we suggest that long-term investor underperformance does not need to become the new norm, and that long-term investors such as pension funds, who often diversify among a handful of asset classes, can outperform their passive benchmarks and more active, short-term-oriented peers. This is possible because of large data sets and the application of modern technologies and deep learning to the long-term portfolio construction problem (these practices are now common in contemporary asset management ) Most importantly, we first teach machines to learn finance, and only then do we rely on them to guide us in the portfolio construction process.
Investable Asset Classes
To stay close to the factor-investing practices of many pension funds, our approach is quite conservative in terms of investable assets, yet also quite generic, as it is aimed to achieve a “proof of concept” rather than building specific use cases. We use nine factor portfolios as investable asset classes, and these factors are the most commonly used in finance literature over the last thirty years. They are: gross returns on the market (MKT); small-minus-big (SMB); high-minus-low (HML); robust-minus-weak (RMW); conservative- minus-aggressive (CMA) factors; the momentum (MOM) factor; the profitability (ROE) and investment (IA) factors; and the betting-against-beta (BAB) factor. These portfolios capture the most common investment styles like market indexing, growth, value, momentum, or risk characteristics of leverage-constrained institutional portfolios like pension and mutual funds.
Big Data: Macro and Portfolio Characteristics
Using multiple asset or firm-specific characteristics while optimizing asset allocation choices has been proven to outperform traditional approaches, which solely rely on the historical series of asset returns. For each portfolio, we construct 153 characteristics from publicly available data sets. Asset class performance often depends on macro-economic regimes, business cycles, and overall market volatility. To these asset-specific characteristics, we add a set of sixteen macro-indicators, which, among others, include: market-wide dividend-price ratio; dividend yield; earnings-price ratio; stock variance; book-to-market ratio; net equity expansion; Treasury-bill rate; long-term rate of returns; term spread; default spread; and Consumer Price Index (CPI).
Deep Reinforcement Learning (RL) Approach to Long Investment Horizon Portfolio Construction:
While we build our model architecture, we pursue three objectives. First, asset allocation decisions among nine assets should be conditioned on all 153 asset-specific characteristics and sixteen macro-economic indicators, as well as accounting for the diversification effect among these assets on the overall portfolio level.
Second, the model should explicitly be trained on the forward-looking long horizon holding period perspective, for which we set ten years as the maximum. Therefore, our longest investment holding period is ten years, and we allow annual, once a year, portfolio rebalancing.
Portfolio rebalancing, especially for large institutional investors, can be very costly. For example, “Stanford pays $800 million a year in fees on a $30 billion endowment.” Our third objective is thus to minimize these fees, and we want to impose explicit penalties on minimizing asset rebalancing needs and their trading costs while maximizing portfolio expected returns and minimizing its risk, i.e., the volatility.
These three crucial objectives, which are the key to any successful multi-asset long horizon portfolio strategy, can only be accomplished via reinforcement learning (RL). RL is specifically well-suited for solving problems characterized by long-term versus short-term reward trade-offs. It has been applied successfully in robot control, AlphaGo, or self-driving cars.
To reduce Big Data to Smart Data and to extract predictive asset-specific signals for future portfolio performance , our model architecture uses Transformer, a recent AI tool commonly used in natural language processing and computer vision. We also use Long short-term memory (LSTM) to identify hidden macro-economic states in real time and condition our asset allocation decisions on the current and expected macro-economic environment.
Our agent, a robot, aims to maximize a nine asset classes’ portfolio return over the next ten years while keeping its volatility to the minimum and, importantly, keeping changes in asset weights between rebalancing sequences to the minimum given these assets’ trading costs. We allow an agent to do asset re-allocation and rebalancing once a year while aiming to achieve the best overall ten-year performance.
As such, our approach is best described by a Wayne Gretzky quote: “A great hockey player skates to where the puck is going to be, not where it is.”
While we train the model, our agent does exactly that, as it allocates the assets on the net benefits for the whole ten-year period and evaluates the outcome at the end of year ten. The model explicitly considers the consecutive ten-year annual portfolio rebalancing from one year to another before deciding on the asset allocation for the first year. Once the model is trained, we allow it to invest outside of the training data in the first out-of-sample year, which in our data is the beginning of 2005. We then “track” this portfolio performance through the end of the year to measure ex-post performance. After that, we roll our training sample by one year to incorporate the most recent realized data for 2005, retrain the model with the updated data, and allow it to invest at the beginning of 2006 to track the portfolio performance through the end of 2006 till the end of 2020. Therefore, we have sixteen years of portfolio performance to analyze. Note that these sixteen years are not “shown” to the model before making an investment decision for each of them ex-ante.
Out-of-Sample RL Portfolio Performance
What drives the portfolio performance in our model? By design, using a plethora of factors, signals, and strategies, we allow for a factor timing based not only on its past and current realizations but also its future expectations and diversification effect across other factors on the portfolio level. Timing strategies normally involve a lot of portfolio turnover. We take explicit care of it by training the model to minimize trading costs over all ten rebalancing frequencies.
During our “test” period, 2005 to 2020, the RL portfolio achieved an annualized Sharpe ratio between 2.7 and 3. For comparison, the Sharpe ratio of S&P500 for the same period is 0.55, which suggests that, with the help of extra leverage we currently do not use, our agent can outperform the stock market by a factor of 5 to 6.
The average annual return of the RL portfolio is 7.4%, and its standard deviation is 2.73%. For comparison, the average annual return of S&P500 for the same period is 8.7%, and its standard deviation is 16.3%. Therefore, our RL model achieves a similar average annual portfolio return performance as the S&P500 with substantially lower risk, as measured by the standard deviation of portfolio returns. Institutional investors, especially pension funds and educational endowments, are often bounded by the amount of risk they can take on their portfolio level. Our RL approach provides the methodology to achieve a common benchmark, S&P500, performance with very low-risk exposures.
Another common measure of portfolio performance is alpha, i.e., an excess portfolio return over the theoretically possible or certain empirical benchmark return. Here, our benchmark return is any static, passive combination of all nine assets we use to construct the RL portfolio, and we estimate the alpha as an intercept from the regression of RL portfolio returns on all nine factors. We obtain an alpha of 6.5% per year, which means that we outperform all individual or passive combinations of these nine assets by economically meaningful magnitudes.
How do we achieve this outperformance? Our long-term RL portfolio has an annual turnover of 20%, which means that we re-adjust 1/5th of all holdings to maximize a ten-year objective. Thus, there is an element of active management that is also a standard industry practice. The average turnover costs are 86 basis points, bps, per year which still provides significant net, after-trading costs, returns, and alphas. Had we not trained the model to keep rebalancing to the minimum, our portfolio turnover would have increased by 34% per annum.
Finally, our portfolio outperforms benchmarks during high and low market volatility regimes and achieves the highest performance during low volatility regimes when investors are the least financially constrained to seek the leverage to augment their winning asset positions.
COVID Market Crush Case Study
As a special case, we analyze how the model allocates the weights around COVID-19 in March 2020, when markets plunged. 2020 is the last year of our test period, and the model was retrained last time at the end of 2019. Moreover, the model has never been trained on anything like this pandemic episode as it never happened before in our overall sample period, 01/1980 to 12/2020. Therefore, this event provides a unique laboratory experiment to examine how, after training on previous crisis episodes, the model makes decisions for something it has never experienced. Our agent decreases the weight of the market portfolio before March 2020, when the market volatility started increasing, and then increases it at the end of March, i.e., it advises buying the market at its bottom. The model also advises taking a negative exposure, short-selling small-cap stocks at the end of January 2020 and covering most of the short position in March 2020, when this short position was the most in-the-money. These ex-ante model decisions can be found quite rational and ex-post effective by institutional fund portfolio managers who cannot short sell the market but can time the market volatility and reduce their positions in small-cap stocks when volatility is expected to be high.
Traditional portfolio theory has limited practical applications. Portfolio managers often rely on their professional experiences and subjective assessments to make investment decisions, and they achieve better performance without relying on any model. Modern financial markets empowered by technological innovations, big data, alternative data, and other types of soft information evolve, change, and become more complex every day. While previous experiences and expertise count for a lot, interpreting and adapting to the current market trends, investor preferences, and risk appetites is just as important. As a human, it takes time and long enough historical samples to digest and understand new market tastes and trends. The machines, AI technologies with machine learning capabilities, on the other hand, after being trained by a human, can identify abnormal trends in the data in real time and react to the expected risks faster than a human portfolio manager. The COVID-19 market plunge is only one of many examples where we are often surprised by how well the algorithms anticipate the risk, especially the source of risk. In our other work, we analyze the prediction of the deep learning model around the 2008-2009 financial crisis. A month before the biggest market plunge in September 2008, the model identified real estate holdings as the risk factor, and like the COVID-19 episode, the model had never seen a real estate bubble crash.
The main message here is that there is a reason that 75% of Refinitiv survey respondents state the wide adoption of deep learning in their practices – it works! Deep learning provides a viable alternative to somewhat “dead-end old portfolio theory,” and, after being properly trained, the machines can “learn” the experiences of successful portfolio managers. With bigger data and more information, machines can also make real-time decisions faster. Faster means re-allocating assets away from expected risks and protecting investors from immediate losses.
Recently, pension funds in Canada and the US reported negative performances due to the bear market caused by rising interest rates and higher inflation during the first half of 2022. For example, the largest Quebec pension manager, Caisse, loses $33.6-billion in the first half of 2022. Norway’s Wealth Fund lost $174 billion for the same time period. New York City Retirement Systems’ returns plunge 8.65% for the fiscal year ended June 30, 2022.
This poor performance could have been prevented had the pension fund system adopted modern-day technologies. In sum, there are modern-day solutions to portfolio management practices which can handle all current market complexities and are able to forecast market downturns better than humans. It makes them an indispensable risk management instrument for any type of asset manager. Adopting these technologies should become one of the first order of priorities.
Chengyu Zhang is a Ph.D. student in Finance at McGill University Desautels School of Management.
This paper is adapted from their paper, “Long Horizon Multifactor Investing with Reinforcement Learning,” available on SSRN.