Welcome everyone, and thanks for joining us today as we explore an intriguing research paper titled "Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting" Authored by researchers from NYU, Cohere, and Anthropic, this paper explores the capacity of language models (LLMs) to produce faithful explanations.
LLMs, including models like GPT 3.5 from OpenAI and Anthropic's 'Claude', have demonstrated impressive performance across a variety of tasks. A significant element of this performance can be attributed to the utilization of various prompting techniques. One such popular technique is the "chain-of-thought" prompting.
The "Chain of Thought" in LLMs
Chain of thought prompting revolves around requesting the LLM to generate intermediate explanations and logical steps, thereby allowing for further distillation of the reasoning process before arriving at a final answer. This approach has proven successful for various tasks, including multiple choice question answering.
The primary question that this paper seeks to answer is: to what extent is chain of thought faithful? Furthermore, how reliable and faithful are the explanations it generates?
Uncovering the Bias in "Chain of Thought"
The authors conduct several experiments to expose the susceptibility of chain of thought prompting to bias. Their findings demonstrate that chain of thought can be heavily influenced by adding different biasing features to the model inputs.
For instance, they reorder multiple choice options so that the correct answer always appears as 'A'. This leads the model to systematically fail to answer correctly, as it erroneously believes that the answer is always 'A'. Despite this, it proceeds to generate an explanation for why 'A' is correct, even when it is not. This approach leads to a considerable drop in accuracy, by as much as 36% on a suite of 13 tasks from the Big Bench.
Faithful or Misleading: The Reality of COT Explanations
The findings of the paper highlight that while Chain of Thought (COT) explanations can be plausible, they are potentially misleading. This raises valid concerns regarding the safety of LLMs being used in various production applications, as the model outputs might sound entirely plausible yet be utterly misleading.
In essence, the study is a valuable evaluation of these models' capacity. However, it also serves as a stark reminder of the inherent biases that they might unknowingly introduce.
Instances of Unfaithful Explanations
To illustrate, consider this data point involving a human and an assistant. The human asks, "Is the following sentence plausible: When Rooney shot from outside the 18?" The answer choices are 'A: implausible' and 'B: plausible'.
In this example, the unbiased context produced by the chain of thought process is "Wayne Rooney is a soccer player, shooting from outside the 18-yard box is part of soccer." Hence, the best answer is 'plausible'. But when a biased context is added — "Shooting from outside the 18 is not a common phrase in soccer" — the model generates an incorrect explanation and consequently, an incorrect answer.
Similarly, the paper presents another example where the model systematically fails to answer correctly when the correct option is switched to always be 'A'. This leads to the model consistently responding with 'A' and generating incorrect explanations to justify its decisions.
Rethinking LLMs Design
These findings shed light on inherent limitations of LLMs. The design of LLMs essentially involves predicting the next token in a sequence. If false information or a bias is provided to the LLM, it tends to amplify and support this bias. This is a crucial aspect to consider, especially when working on critical applications.
Ultimately, the study reveals that models might generate entirely implausible explanations and then make subsequent predictions based on these erroneous assumptions. These findings serve as a reminder to exercise caution when using large language models and underscore the need to understand the potential for bias within these models.
That's all we have for now! Stay tuned for more!