Retrieval Augmented Generation (RAG)

Summary: Retrieval augmented generation techniques enhance the knowledge updatability of agents for goal achievement, and maintain data privacy of on-premise foundation model-based agents/systems implementations.

Context: Large foundational model-based agents are not equipped with knowledge related to explicitly specific domains, especially on highly confidential and privacy-sensitive local data, unless they are fine-tuned for pre-trained using domain data.

Problem: Given a task, how can agents conduct reasoning with data/knowledge that are not learned by the foundation models through model training?

Forces:

  • Lack of knowledge. The reasoning process may be unreliable when the agent is required to accomplish domain-specific tasks that the agent has no such knowledge reserve.
  • Overhead. Fine-tuning large foundation model using local data or training a large foundation model locally consumes high amount of computation and resource costs.
  • Data Privacy. Local data are confidential to be used to train or fine-tune the models.

Solution: Fig. 1 illustrates a high-level graphical representation of retrieval augmented generation. RAG is a technique for enhancing the accuracy and reliability of agents with facts retrieved from other sources (internal or online data). The knowledge gaps where the agents are lacking in memory are filled using the parameterized knowledge generated in vector databases. For instance, during plan generation, specific steps may require information that is not within the original agent memory. The agent can retrieve information from the parameterized knowledge and use for planning, while the augmented response (i.e. plan) will be return back to the user. The retrieval process requires zero pretraining or fine-tuning of the model served by the agent which preserves the data privacy of local data, reduces training and computation costs, and also provides up-to-date and more precise information required. The retrieved local data can be sent back to the agent via prompts (need to consider the context window size), whereas the agent is able to process the information and generate plans via in-context learning. Currently there is a cluster of RAG techniques focusing on various enhancement aspects, data sources and applications [1], for instance, federated RAG [2], graph RAG [3], etc.

Figure 1. Retrieval augmented generation.

Benefits:

  • Knowledge retrieval. Agents can search and retrieve knowledge related to the given tasks, which ensures the reliability of reasoning steps.
  • Updatability. The prompts/responses generated using RAG by the agent on internal or online data are updatable by the complimentary parameterized knowledge.
  • Data privacy. The agent can retrieve additional knowledge from local datastores, which ensures data privacy and security.
  • Cost-efficiency. Under the data privacy constraint, RAG can provide essential knowledge to the agent without training a new foundation model entirely. This reduced the training costs.

Drawbacks:

  • Maintenance overhead. Maintenance and update of the parameterized knowledge in the vector store requires additional computation and storage costs.
  • Data limitation. The agents still mainly rely on the data it has been trained on to generate prompts. This can impact the quality and accuracy of the generated content in those specific domains.

Known uses:

  • LinkedIn. LinkedIn applies RAG to construct the pipeline of foundation model based agents, which can search appropriate case studies to respond users.
  • Yan et al. [4] devise a retrieval evaluator which can output a confidence degree after assessing the quality of retrieved data. The solution can improve the robustness and overall performance of RAG for agents.
  • Levonian et al. [5] apply RAG with GPT-3.5, developing an agent that can retrieve the contents of a high-quality open-source math textbook to generate responses to students.

Related patterns: Retrieval augmented generation can complement all other patterns by providing extra context information from the local datastore.

References:

[1] Y. Hu and Y. Lu, “Rag and rau: A survey on retrieval-augmented language model in natural language processing,” arXiv preprint arXiv:2404.19543, 2024.
[2] S. Wang, E. Khramtsova, S. Zhuang, and G. Zuccon, “Feb4rag: Evaluating federated search in the context of retrieval augmented generation,” arXiv preprint arXiv:2402.11891, 2024.
[3] J. Larson and S. Truitt, “GraphRAG: Unlocking LLM discovery on narrative private data,” February 2024. [Online]. Available: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
[4] S.-Q. Yan, J.-C. Gu, Y. Zhu, and Z.-H. Ling, “Corrective retrieval augmented generation,” arXiv preprint arXiv:2401.15884, 2024.
[5] Z. Levonian, C. Li, W. Zhu, A. Gade, O. Henkel, M.-E. Postle, and W. Xing, “Retrieval-augmented generation to improve math question-answering: Trade-offs between groundedness and human preference,” arXiv preprint arXiv:2310.03184, 2023.