Large language models have shown great potential in recent years, from generating creative text, to predicting protein structures. But access to these models has been limited due to their massive size and required resources to run them.
Today, we are taking a look at a new model, LLaMA, which was recently released by Meta AI. LLaMA is a a state-of-the-art foundational language model that is much more parameter-efficient than comparable models such as GPT-3 or PaLM, while achieving very good performance.
LLaMA is currently available to download and use for research purposes.
Introduction
Over the last year, large language models have shown new capabilities to generate creative text, solve mathematical theorems, predict protein structures, answer reading comprehension questions, and more. However, full research access to large language models has remained limited due to the resources required to train and run such large models. Smaller models trained on more tokens are easier to retrain and fine-tune for specific product use cases.
Today, we are looking at LLaMA (also see the official blog post), a foundational language model available at several sizes (7, 13, 33, and 65 billion parameters). The 65 billion model is one of the largest models of its kind currently available for research purposes.
LLaMA has been trained on a larger corpus than some other foundation models, such as GPT-3: the 65 billion and 33 billion models are trained on 1.4 trillion tokens, which is about 3 times larger than GPT-3. Unlike previous work, the model has been exclusively trained on publicly available data, without resorting to use of any proprietary datasets that are not available to the public. The training dataset contains 20 languages in total.
The architecture of LLaMA is pretty standard and similar to other transformer-based language models, except to small modifications and optimisations.
Results
The authors perform extensive experiments on both zero-shot and few-shot learning benchmarks covering Common Sense Reasoning, Closed-book Question Answering, Mathematical Reasoning, Reading Comprehension, Code generation, and others.
The main result is that the 13 billion LLaMA model outperforms GPT-3 on most benchmarks, while being 10 times smaller. It's important to note that the comparison here is not with the latest GPT-3 model that has been trained on instructions, but with the previous iteration, released in 2020.
The 65 billion LLaMA model is also competitive with the current best models, Chinchilla, with 70 billion parameters, and PaLM, with 540 billion. Additional gains can be achieved through instruction tuning.
The authors also perform an interesting evaluation of the evolution of performance of the models during training. On most benchmarks, the performance improves steadily, and correlates with the training perplexity of the model. It seems that there's still potential for scaling these models further, which can be achieved either by scaling the training dataset, or by increasing model size: both are helpful.
Conclusion
We took a look at LLaMA, a new foundation language model by Meta AI. LLaMA achieves great performance that is comparable or better than much larger language models, making it an attractive starting point for fine-tuning on specific tasks. LLaMA is available only for research use at the moment, which is one of its downsides - the license doesn't allow for any commercial use.
Thanks for reading! If you are looking for state-of-the-art expertise in Natural Language Processing, you should check out our services at The Global NLP Lab.