Foundation Models for Numerical Tasks

1. Language models

By now, we all know that large language models (LLMs) are very capable in qualitative and language-based tasks. The jury is still out, however, concerning their ๐ซ๐ž๐š๐ฌ๐จ๐ง๐ข๐ง๐  ๐š๐ง๐ ๐ง๐ฎ๐ฆ๐ž๐ซ๐ข๐œ๐š๐ฅ skills.

Researchers at the University of Chicago’s Booth School of Business (my alma mater) used ๐…๐ข๐ง๐š๐ง๐œ๐ข๐š๐ฅ ๐’๐ญ๐š๐ญ๐ž๐ฆ๐ž๐ง๐ญ ๐€๐ง๐š๐ฅ๐ฒ๐ฌ๐ข๐ฌ (FSA) to test LLMsโ€™ ability to analyze and synthesize purely financial numbers (paper here). The task was to predict whether earnings will grow or decline in the following period (various timeframes tested). The LLM (GPT 4.0 Turbo) was not given any textual information, just numbers, as shown in Fig. 1.

Figure 1: One shot prompting: quantitative input data for the prompt (image from the paper).

After telling it to assume the role of a financial analyst, ๐‚๐ก๐š๐ข๐ง-๐จ๐Ÿ-๐“๐ก๐จ๐ฎ๐ ๐ก๐ญ (CoT) techniques guided the LLM towards its answers. The LLM was asked to:

  1. Identify notable changes in the financial statements.
  2. Compute financial ratios, by first stating the formulae, then computing the ratios.
  3. Provide economic interpretations of the computed ratios
  4. Predict the directional change of future earnings and provide the rationale for that prediction.
Figure 2: The LLM’s answer (image from the paper).

The authors found that the LLM, with CoT, easily outperformed the median financial analyst. Even though the LLM was only given quantitative material, it benefited from its general ‘understanding’ of the world, including business and investment know-how, combined with an emerging form of intuitive reasoning and a capacity to formulate hypotheses. Moreover, human financial analysts suffer from statistical bias, in all likelihood more so than LLMs in this specific, quantitative use case.

The authors also trained a three-layer artificial neural network (ANN) on a vast body of data. This ๐ญ๐š๐ฌ๐ค-๐ฌ๐ฉ๐ž๐œ๐ข๐Ÿ๐ข๐œ ๐€๐๐ just matched the general-purpose LLM’s accuracy. A remarkable result, considering that an ๐จ๐Ÿ๐Ÿ-๐ญ๐ก๐ž-๐ฌ๐ก๐ž๐ฅ๐Ÿ ๐ ๐ž๐ง๐ž๐ซ๐š๐ฅ-๐ฉ๐ฎ๐ซ๐ฉ๐จ๐ฌ๐ž ๐‹๐‹๐Œ without any further fine-tuning was used.

Overall, FSA is an interesting use case demonstrating the numerical skills and emerging reasoning capabilities of general-purpose LLMs. I’d like to see the results of this study when the LLM was fine-tuned with the data fed into the ANN…

2. Specialized foundation models

Above, I showed research demonstrating how a language model, basically pre-trained to perform next-word prediction, was capable of accomplishing ๐ง๐ฎ๐ฆ๐ž๐ซ๐ข๐œ๐š๐ฅ ๐ญ๐š๐ฌ๐ค๐ฌ and some related reasoning.

Recently, a new breed of specialized foundation models has emerged. ๐“๐ข๐ฆ๐ž๐†๐๐“ is such a model ๐ฌ๐ฉ๐ž๐œ๐ข๐š๐ฅ๐ข๐ณ๐ž๐ ๐ข๐ง ๐ญ๐ข๐ฆ๐ž ๐ฌ๐ž๐ซ๐ข๐ž๐ฌ: It is pre-trained on over 100 billion rows of financial, weather, Internet of Things (IoT), energy, and web data.

In their latest paper, my LIRIS colleagues tested TimeGPT for soil water potential prediction for orchards. As data gathering in agriculture is often expensive, the relative ๐ฌ๐ก๐จ๐ซ๐ญ๐š๐ ๐ž ๐จ๐Ÿ ๐๐š๐ญ๐š often precludes data-hungry deep learning methods such as LSTMs.

Figure 3: TimeGPT architecture (image from the paper). Notice how a CNN replaces the Feed Forward layers from the original GPT architecture.

They find that, with minor fine-tuning using the target variable’s (soil water potential) history only, TimeGPT delivers respectable results, only losing out against the state-of-the-art Temporal Fusion Transformer (TFT) model. Note that the TFT model also included exogenous variables such as weather data in its dataset. Considering its ๐ฌ๐ฎ๐ฉ๐ž๐ซ๐ข๐จ๐ซ ๐ž๐š๐ฌ๐ž ๐จ๐Ÿ ๐ฎ๐ฌ๐ž ๐ข๐ง ๐ญ๐ž๐ซ๐ฆ๐ฌ ๐จ๐Ÿ ๐ž๐Ÿ๐Ÿ๐จ๐ซ๐ญ ๐š๐ง๐ ๐๐š๐ญ๐š, TimeGPT can therefore be considered a serious alternative for use cases plagued by data scarcity. TimeGPT and other such specialized foundation models can leverage their learned skills, such as time series forecasting, to address new problems where training data are not sufficiently available for alternative deep learning methods that require training from scratch.

Figure 4: Conventional versus foundation models (image from the paper).


Leave a Reply

Your email address will not be published. Required fields are marked *