Introducing KBLaM: Bringing plug-and-play external knowledge to LLMs

Click for: original source

Large language models (LLMs) have demonstrated remarkable capabilities in reasoning, language understanding, and even creative tasks. Yet, a key challenge persists: how to efficiently integrate external knowledge. By Taketomo Isazawa.

KBLaM introduces “rectangular attention,” an extension of standard transformer attention. This mechanism integrates structured knowledge from external triples into LLMs as learnable key-value pairs, significantly boosting efficiency and scalability over traditional RAG or in-context learning for large KBs.

Unlike fine-tuning (costly retraining) or basic RAG (separate retrieval modules causing complexity), KBLaM encodes facts offline using JSON extraction and probabilistic clustering. These encoded knowledge tokens are then inserted into the LLM’s attention layers via rectangular attention, where user prompts attend to them, but they do not attend among themselves.

This allows dynamic retrieval during inference without retraining. Critically, it achieves linear scaling in memory and computational cost (inference) with KB size, whereas standard approaches incur quadratic costs. This efficiency enables integrating vast amounts of knowledge (thousands of facts) on a single GPU much more effectively than alternatives, enhancing reliability by teaching the LLM to refuse questions lacking necessary information. Nice one!

[Read More]

Tags azure cloud ai cio big-data