Is it wise to rely on a Large Language Model (LLM) like GPT-4 for financial guidance? Probably not! OpenAI itself states that their model output "should not be considered professional advice."
Even so, many people are imagining new uses for LLMs, many of which involve market analysis. I embarked on a simple yet revealing experiment to evaluate GPT-4's forecasting accuracy in the stock market. The results are at once interesting and entirely predictable.
The Experiment:
GPT-4's training data does not currently include any content published after April 2023. I prompted GPT-4 to construct a portfolio that would surpass the S&P-500's performance over eight months. To my surprise, by December 31, 2023, the LLM's portfolio outperformed the S&P 500 by an impressive 20%!
However, was this genuine forecasting skill?
To investigate, I asked GPT-4 to generate three distinct portfolios:
I tracked these portfolios from April to December 2023.
Results:
All three portfolios, including the negative control, outperformed the S&P 500 by approximately 20%! The additional information provided to the LLM, seemed to have no effect on the LLMs forecasting ability. In fact, the negative control yielded the strongest portfolio, indicating that the LLM was not exhibiting any skill whatsoever.
Implications:
The results hint that the LLM did not meaningfully “understand” the prompts. While it could generate portfolios, there seemed to be no strategic reasoning behind its choices. Instead, it appears GPT-4 favored stocks frequently mentioned in its training data. This was further evidenced when I asked the LLM for stock tickers it felt most confident discussing; the response included fifteen tickers, all of which appeared in every portfolio generated.
AI language models like GPT-4 can certainly offer valuable market insights, but their current iteration is probably not suited for independent financial decision-making.
Closing Thoughts:
There are anecdotes of individuals using tools that leverage LLMs to sift through financial data and “turbocharge” their portfolios. While these tools might be helpful, the inherent biases in an LLM's training data should not be ignored. One should consider questions like, if an LLM processed thousands of articles with a positive view of a particular company, would it be likely to advise against investing in that company?
As we move further into an era where both physical and cognitive tasks are becoming subject to automation, it's wise to ponder these questions. Especially since the internal mechanics of these AI entities, as of now, remain largely elusive.
Source Code:
This project was created with help from Joshua Cooper, a fellow master’s student at the Institute for Advanced Analytics at NCSU. The code was written in Python and used the OpenAI and Pandas packages. Please feel free to explore, modify, and adapt the code by following this link to a Google Colab document.
Thank you for reading!