Text-based artificial intelligence (AI) tools such as GPT-4 enable tech-savvy laypeople to conduct tasks in complex domains with little prior experience. The financial sector, dealing with vast amounts of data daily, is among the industries most eagerly working on AI solutions. For example, Morgan Stanley’s asset management division was one of GPT-4’s first customers and worked with developer OpenAI to optimize generative models.
In our recent paper, we investigate whether AI tools such as GPT-4 are a suitable source of financial advice. We construct hypothetical investor profiles to assess whether GPT-4 tailors its recommendations to the individual investor. The profiles differ in risk tolerance, risk capacity, and sustainability preference. We then request specific portfolio recommendations from GPT-4 for each of the investor profiles. In response to the request, GPT-4 provides a portfolio recommendation consisting mostly of exchange-traded funds (including ticker) and corresponding portfolio shares. It also explains its reasoning, acknowledging the investor profile’s risk tolerance, investment horizon, and age in determining a suitable portfolio. To assess the quality of the proposed portfolios, we obtain portfolio suggestions for the investor profiles from the automated financial advisory solution of an established US financial advisory firm, which currently oversees more than $50 billion in assets under management.
To assess the level of diversification in the portfolios suggested by GPT-4, we compare each portfolio’s composition with respect to geography and asset classes to the benchmark portfolio obtained from the professional financial advisor. We distinguish between domestic (U.S.), developed, and emerging market securities, equity, fixed income, alternative assets such as real estate or commodities, and cash. A few things are worth noting. First, GPT-4 portfolios generally provide exposure to the same geographies and asset classes as the professionally advised portfolios. Second, GPT-4 portfolios exhibit considerable home bias compared to benchmark portfolios and U.S. stocks’ global market capitalization share. Within international equity, emerging market stocks are particularly underweighted. Third, the portfolio suggestions are more sensitive to risk tolerance and less sensitive to investment horizon than the benchmark portfolios. Fourth, the portfolios suggested for investor profiles with sustainability preferences include ESG-focused versions of the portfolio components, such as the iShares ESG Aware MSCI USA ETF.
We compute monthly average return, volatility figures, and annual Sharpe ratios for the GPT-4 and benchmark portfolios from December 2016 to May 2023, the longest time period for which data is available for all investment products in the GPT-4 and benchmark portfolios.1 We find that GPT-4 portfolios provided equal, if not superior, risk-return profiles compared to the benchmark portfolios.
We estimate the coefficients of a six-factor regression model to account for common risk factors driving portfolio performance and to investigate the exposure to those risk factors. Specifically, we benchmark the excess portfolio returns against six well-known asset pricing factors: the market excess return, the small-minus-big size factor, the high-minus-low value factor, the robust-minus-weak operating profitability factor, the conservative-minus-aggressive investment factor, and the winners-minus-losers momentum factor. Our results can be summarized as follows. First, the GPT-4 and benchmark portfolios earned negative risk-adjusted returns for most profiles. Monthly alphas range from -21 to -26 basis points and are highly statistically significant. Re-running the six-factor model on long-short portfolios (long GPT-4, short benchmark portfolios) confirms that the risk-adjusted performance of GPT-4 portfolios is no different from the benchmark portfolios. Second, the market betas confirm our previous finding that the GPT-4 portfolios are more responsive to risk tolerance and less responsive to investment horizons than the benchmark portfolios.
Taken together, the GPT-4 portfolios earned risk-adjusted returns on par with benchmark portfolios. This suggests that the superior risk-return profiles were achieved by exposure to commonly considered risk factors, particularly higher exposure to market risk.
Our analyses are essentially a backward test of today’s recommendations on past performance data. A more adequate performance analysis would have to be conducted in 30 years when today’s portfolio recommendation nears the end of the investment horizon. To complement our backward-looking performance evaluation, we exploit the fact that GPT-4 was trained on information up to October 2021, after which it was cut off from information from the internet. Thus, the performance of GPT-4 portfolios after October 2021 should not be subject to hindsight bias (but the benchmark portfolios’ performance might). We compute performance measures for the GPT-4 and benchmark portfolios from October 2021, which does not change our main results.
While we have shown that GPT-4 is already effective at one part of the financial advisory process—namely, matching information on the client’s risk tolerance and investment horizon to a suitable portfolio of financial products—it is currently unable to perform adjacent steps in the advisory process such as risk profiling, implementation, and rebalancing. Thus, while GPT-4 does well in matching investor profiles to specific portfolios, it will likely not make the entire financial advisory process redundant in the near future. Instead, financial advisors may use it as a back-office solution. They use GPT-4 to generate portfolio recommendations from the investor profiles they generate, which they may implement and rebalance accordingly. Nevertheless, our results raise interesting questions concerning the regulation of financial advice and liability issues. For instance, who should be liable for incorrect advice using GPT-4 as a backend solution? Our results show that GPT-4 would at least theoretically be able to provide such financial advice and take existing regulatory requirements into account.
Christian Fieberg is a Professor of Data Science at City University of Applied Sciences Bremen, Germany.
Lars Hornuf is a Professor of Business Administration, esp. Finance and Financial Technology at Dresden University of Technology, Germany.
David Streich is an Assistant Professor for Digital Finance at the WFI Ingolstadt School of Management, Germany.
This post was adapted from their paper, “Using GPT-4 for Financial Advice,” available on SSRN.
 We omit profiles with sustainability preferences since the suggested sustainable products have only been incepted recently, leaving us with insufficient historical performance data.