Experiments in AI Finance: Assessing GPT-4's Stock Forecasting Ability (2024)

Is it wise to rely on a Large Language Model (LLM) like GPT-4 for financial guidance? Probably not! OpenAI itself states that their model output "should not be considered professional advice."

Even so, many people are imagining new uses for LLMs, many of which involve market analysis. I embarked on a simple yet revealing experiment to evaluate GPT-4's forecasting accuracy in the stock market. The results are at once interesting and entirely predictable.

The Experiment:

GPT-4's training data does not currently include any content published after April 2023. I prompted GPT-4 to construct a portfolio that would surpass the S&P-500's performance over eight months. To my surprise, by December 31, 2023, the LLM's portfolio outperformed the S&P 500 by an impressive 20%!

However, was this genuine forecasting skill?

To investigate, I asked GPT-4 to generate three distinct portfolios:

  1. Negative Control: a portfolio that would underperform the S&P 500.
  2. Control: a portfolio that would mirror the S&P 500's performance.
  3. Experimental: a portfolio created by chaining several prompts together, providing the LLM additional pre-April 2023 data.

I tracked these portfolios from April to December 2023.

Results:

All three portfolios, including the negative control, outperformed the S&P 500 by approximately 20%! The additional information provided to the LLM, seemed to have no effect on the LLMs forecasting ability. In fact, the negative control yielded the strongest portfolio, indicating that the LLM was not exhibiting any skill whatsoever.

Implications:

The results hint that the LLM did not meaningfully “understand” the prompts. While it could generate portfolios, there seemed to be no strategic reasoning behind its choices. Instead, it appears GPT-4 favored stocks frequently mentioned in its training data. This was further evidenced when I asked the LLM for stock tickers it felt most confident discussing; the response included fifteen tickers, all of which appeared in every portfolio generated.

AI language models like GPT-4 can certainly offer valuable market insights, but their current iteration is probably not suited for independent financial decision-making.

Closing Thoughts:

There are anecdotes of individuals using tools that leverage LLMs to sift through financial data and “turbocharge” their portfolios. While these tools might be helpful, the inherent biases in an LLM's training data should not be ignored. One should consider questions like, if an LLM processed thousands of articles with a positive view of a particular company, would it be likely to advise against investing in that company?

As we move further into an era where both physical and cognitive tasks are becoming subject to automation, it's wise to ponder these questions. Especially since the internal mechanics of these AI entities, as of now, remain largely elusive.

Source Code:

This project was created with help from Joshua Cooper, a fellow master’s student at the Institute for Advanced Analytics at NCSU. The code was written in Python and used the OpenAI and Pandas packages. Please feel free to explore, modify, and adapt the code by following this link to a Google Colab document.

Thank you for reading!

Experiments in AI Finance: Assessing GPT-4's Stock Forecasting Ability (2024)
Top Articles
Latest Posts
Article information

Author: Frankie Dare

Last Updated:

Views: 5690

Rating: 4.2 / 5 (73 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Frankie Dare

Birthday: 2000-01-27

Address: Suite 313 45115 Caridad Freeway, Port Barabaraville, MS 66713

Phone: +3769542039359

Job: Sales Manager

Hobby: Baton twirling, Stand-up comedy, Leather crafting, Rugby, tabletop games, Jigsaw puzzles, Air sports

Introduction: My name is Frankie Dare, I am a funny, beautiful, proud, fair, pleasant, cheerful, enthusiastic person who loves writing and wants to share my knowledge and understanding with you.