Hello everyone! Today, we delve into the world of open-source large language models. Our focus? The Falcon foundational language model.
Introduction to Falcon
The Falcon language model was developed by the Technology Innovation Institute. It's a large language model coming in two sizes: 40 and 7 billion parameters. What's impressive is that it outperforms GPT-3 significantly while using only 75% of the training compute.
Not only does Falcon surpass GPT-3, but it also trumps Chinchilla by DeepMind and Pile by Google. Falcon has been made available for commercial use through an open-source license, which is significant in the world of open source large language models.
The Unique Features of Falcon
What sets Falcon apart? It's in the high-quality content that it extracts from the web. The authors dedicated a considerable amount of time to harness the power of data, making it the crux of this model. In addition to its focused data-driven approach, Falcon relies on custom frameworks for training, offering fast processing capabilities.
The architecture of Falcon is similar to GPT-3. The primary differences from GPT-3 are the use of positional embeddings and the integration of multi-query and flash attention. This type of attention is a more recent, efficient iteration of the standard O(n^2) attention.
The team also releases checkpoints that have been instruction fine-tuned.
Falcon's Performance
The 40 billion instruction model was trained using 64 A100 GPUs on AWS SageMaker. The result? The model tops the Open LLM Leaderboard by HuggingFace, outperforming other large language models with an impressive average score of 63.2, which is 5 points more than LLaMa-65B.
The Impact of Instruction Fine-Tuning
Instruction fine-tuning plays a notable role in the performance of Falcon on the truthful QA dataset, a zero-shot benchmark that measures the truthfulness of answers generated by the language model. Without instruction fine-tuning, Falcon's performance drops from 52 to 41, underscoring the importance of this feature. For other benchmarks, instruction fine-tuning seems to have a smaller impact.
Getting Started with Falcon
At the time of writing, Falcon is an excellent choice for those looking to use an open-source large language model. Example code is readily available on HuggingFace. Running the 40 billion instruction model might require a substantial GPU resources, although it is possible to run the model with 4-bit precision on an A100.
In conclusion, the Falcon team has done a phenomenal job in creating a powerful and efficient language model that bridges the gap to closed-source LLMs such as GPT-4 and PaLM. Easier access to open-source LLMs and decreasing capacity to run them shows promise for the future of AI. We're thrilled to see what comes next and the creative applications developers will come up with with the Falcon model.
Stay tuned for more exciting developments in the world of AI and language models!