In recent years, large language models (LLMs) like GPT-3 and Codex have seen tremendous success in various natural language processing tasks. However, they are not able to precisely recall all the knowledge stored in the training corpus. To address this issue, researchers have proposed retrieval-augmented language models (RALMs), which can retrieve external knowledge to make better predictions.
Today, we are taking a look at an interesting paper that proposes a novel RALM method that is very simple and applicable to all LLMs.
Also check out the video we made on REPLUG.
Previous RALM approaches, depicted on the top of the figure above, rely on a frozen retriever model, which is used to tune a trainable language model. This approach has several limitations, most notably, it is prohibitive to tune a large language model.
In this work, the authors introduce REPLUG, a new RALM framework that treats the language model as a black box and augments it with a tuneable retrieval model. REPLUG retrieves a small set of relevant documents from an external corpus and prepends them to the input context of the black-box LM. The design of REPLUG is extremely flexible and can be applied to any existing black-box LM and retrieval model.
The retriever is implemented as a nearest neighbour search index, that fetches top-k document embeddings given a query. To allow for processing longer contexts REPLUG also introduces an ensemble method that computes output probabilities in parallel for multiple documents, and combines them when generating the predictions.
Furthermore, the authors introduce a training scheme that improves the retrieval model with supervision signals from the black-box language model. The training objective prefers retrieving documents that improve language model perplexity, while treating the LM as a frozen, black-box scoring function.
The authors test their method using a number of different language models, including GPT-2, GPT-3, Bloom, and Codex.
So let's look at the results. Experiments show that REPLUG can significantly improve the performance of diverse black-box LMs on both language modeling and downstream tasks, such as open-domain question answering. For instance, REPLUG can improve Codex performance on the Massive Multi-task Language Understanding dataset by 4.5%, achieving comparable results to a large, instruction-finetuned model, Flan-PaLM.
Tuning the retriever leads to additional improvements, including up to 6.3% increase in GPT-3 language modelling with the 175 billion parameter model.
So to sum it up. REPLUG is a simple and affective approach to integrating retrieval into a language model. The improvements of REPLUG are consistent across diverse models of various sizes, including models only accessible via an API. This makes REPLUG a promising plug-in solution for many use cases that could benefit from retrieval.
Thanks for reading! If you are looking for state-of-the-art expertise in Natural Language Processing, you should check out our services at The Global NLP Lab.