Machine Learning and FOMC Statements: What’s the Sentiment?
That’s where machine learning (ML) and natural language processing (NLP) come in. We applied Loughran-McDonald sentiment word lists and BERT and XLNet ML techniques for NLP to FOMC statements to see if they anticipated changes in the federal funds rate and then examined whether our results had any correlation with stock market performance.
Loughran-McDonald Sentiment Word Lists
Before calculating sentiment scores, we first constructed word clouds to visualize the frequency/importance of particular words in FOMC statements.
Word Cloud: March 2017 FOMC Statement
Word Cloud: July 2019 FOMC Statement
Although the Fed increased the federal funds rate in March 2017 and decreased it in July 2019, the word clouds of the two corresponding statements look similar. That’s because FOMC statements generally contain many sentiment-free words with little bearing on the FOMC’s outlook. Thus, the word clouds failed to distinguish the signal from the noise. But quantitative analyses can offer some clarity.
Loughran-McDonald sentiment word lists analyze 10-K documents, earnings call transcripts, and other texts by classifying the words into the following categories: negative, positive, uncertainty, litigious, strong modal, weak modal, and constraining. We applied this technique to FOMC statements, designating words as positive/hawkish or negative/dovish, while filtering out less-important text like dates, page numbers, voting members, and explanations of monetary policy implementation. We then calculated sentiment scores using the following formula:
Sentiment Score = (Positive Words – Negative Words) / (Positive Words + Negative Words)
FOMC Statements: Loughran-McDonald Sentiment Scores
As the preceding chart demonstrates, the FOMC’s statements grew more positive/hawkish in March 2021 and topped out in July 2021. After softening for the subsequent 12 months, sentiment jumped again in July 2022. Though these movements may be driven in part by the recovery from the COVID-19 pandemic, they also reflect the FOMC’s growing hawkishness in the face of rising inflation over the last year or so.
But the large fluctuations are also indicative of an inherent shortcoming in Loughran-McDonald analysis: The sentiment scores assess only words, not sentences. For example, in the sentence “Unemployment declined,” both words would register as negative/dovish even though, as a sentence, the statement indicates an improving labor market, which most would interpret as positive/hawkish.
To address this issue, we trained the BERT and the XLNet models to analyze statements on a sentence-by-sentence basis.
BERT and XLNet
Bidirectional Encoder Representations from Transformers, or BERT, is a language representation model that uses a bidirectional rather than a unidirectional encoder for better fine-tuning. Indeed, with its bidirectional encoder, we find BERT outperforms OpenAI GPT, which uses a unidirectional encoder.
XLNet, meanwhile, is a generalized autoregressive pretraining method that also features a bidirectional encoder but not masked-language modeling (MLM), which feeds BERT a sentence and optimizes the weights inside BERT to output the same sentence on the other side. Before we feed BERT the input sentence, however, we mask a few tokens in MLM. XLNet avoids this, which makes it something of an improved version of BERT.
To train these two models, we divided the FOMC statements into training datasets, test datasets, and out-of-sample datasets. We extracted training and test datasets from February 2017 to December 2020 and out-of-sample datasets from June 2021 to July 2022. We then applied two different labeling techniques: manual and automatic. Using automatic labeling, we gave sentences a value of 1, 0, or none based on whether they indicated an increase, decrease, or no change in the federal funds rate, respectively. Using manual labeling, we categorized sentences as 1, 0, or none depending on if they were hawkish, dovish, or neutral, respectively.
We then ran the following formula to generate a sentiment score:
Sentiment Score = (Positive Sentences – Negative Sentences) / (Positive Sentences + Negative Sentences)
Performance of AI Models
Predicted Sentiment Score (Automatic Labeling)
Predicted Sentiment Score (Manual Labeling)
The two charts above demonstrate that manual labeling better captured the recent shift in the FOMC’s stance. Each statement includes hawkish (or dovish) sentences even though the FOMC ended up decreasing (or increasing) the federal funds rate. In that sense, labeling sentence by sentence trains these ML models well.
Since ML and AI models tend to be black boxes, how we interpret their results is extremely important. One approach is to apply Local Interpretable Model-Agnostic Explanations (LIME). These apply a simple model to explain a much more complex model. The two figures below show how the XLNet (with manual labeling) interprets sentences from FOMC statements, reading the first sentence as positive/hawkish based on the strengthening labor market and moderately expanding economic activities and the second sentence as negative/dovish since consumer prices declined and inflation ran below 2%. The model’s judgment on both economic activity and inflationary pressure appears appropriate.
LIME Results: FOMC Strong Economy Sentence
LIME Results: FOMC Weak Inflationary Pressure Sentence
By extracting sentences from the statements and then evaluating their sentiment, these techniques gave us a better grasp of the FOMC’s policy perspective and have the potential to make central bank communications easier to interpret and understand in the future.
But was there a connection between changes in the sentiment of FOMC statements and US stock market returns? The chart below plots the cumulative returns of the Dow Jones Industrial Average (DJIA) and NASDAQ Composite (IXIC) together with FOMC sentiment scores. We investigated correlation, tracking error, excess return, and excess volatility in order to detect regime changes of equity returns, which are measured by the vertical axis.
Equity Returns and FOMC Statement Sensitivity Scores
The results show that, as expected, our sentiment scores do detect regime changes, with equity market regime changes and sudden shifts in the FOMC sentiment score occurring at roughly the same times. According to our analysis, the NASDAQ may be even more responsive to the FOMC sentiment score.
Taken as a whole, this examination hints at the vast potential machine learning techniques have for the future of investment management. Of course, in the final analysis, how these techniques are paired with human judgment will determine their ultimate value.
We would like to thank Yoshimasa Satoh, CFA, James Sullivan, CFA, and Paul McCaffrey. Satoh organized and coordinated AI study groups as a moderator and reviewed and revised our report with thoughtful insights. Sullivan wrote the Python code that converts FOMC statements in PDF format to texts and extracts and related information. McCaffrey gave us great support in finalizing this research report.
All posts are the opinion of the author. As such, they should not be construed as investment advice, nor do the opinions expressed necessarily reflect the views of CFA Institute or the author’s employer.
Image credit: ©Getty Images/ AerialPerspective Works