Self-Reflection

 

Summary: Self-reflection enables the agent to generate feedback on the plan and reasoning process and provide refinement guidance from themselves.

Context: Given users’ goals and requirements, the agent will generate a plan to decompose the goals into a set of tasks for achieving the goals.

Problem: A generated plan may be affected by hallucinations of the foundation model, how to review the plan and reasoning steps and incorporate feedback efficiently?

Forces:

  • Reasoning uncertainty. There may be inconsistencies or uncertainties embedded in the agent’s reasoning process, affecting the task success rate and response accuracy.
  • Lack of explainability. The trustworthiness of the agent can be disturbed by the issue of transparency and explainability of how the plan is generated.
  • Efficiency. Certain goals require the plan to be finalised within a specific time period.

Solution: Fig. 1 depicts a high-level graphical representation of self-reflection. In particular, reflection is an optimization process formalised to iteratively review and refine the agent-generated response. The user prompts specific goals to the agent, which then generates a plan to accomplish users’ requirements. Subsequently, the user can instruct the agent to reflect on the plan and the corresponding reasoning process. The agent will review the response to identify and pinpoint the errors, then generate a refined plan and adjust its reasoning process accordingly. The finalised plan will be carried out step by step. Self-consistency [1] exemplifies this pattern.

Figure 1. Plan reflection pattern.

Benefits:

  • Reasoning certainty. Agents can evaluate their own responses and reasoning procedure to check whether there are any errors or inappropriate outputs, and make refinement accordingly.
  • Explainability. Self-reflection allows the agent to review and explain its reasoning process to users, facilitating better comprehension of the agent’s decision-making process.
  • Continuous improvement. The agent can continuously update the memory or knowledge base and the manner of formalising the prompts and knowledge, to provide more reliable and coherent output to users without or with fewer reflection steps.
  • Efficiency. On one hand, it is time-saving for the agent to self-evaluate its response, as no additional communication overhead is cost compared to other reflection patterns. On the other hand, the agent can provide more accurate responses in the future to reduce the overall reasoning time consumption considering the continuous improvement.

Drawbacks:

  • Reasoning uncertainty. The evaluation result is dependent on the complexity of self-reflection and the agent’s competence in assessing its generated responses.
  • Overhead. i) Self-reflection can increase the complexity of an agent, which may affect the overall performance. ii) Refining and maintaining agents with self-reflection capabilities requires specialised expertise and development process.

Known uses:

  • Reflexion [2]. Reflexion employs a self-reflection model which can generate nuanced and concrete feedback based on the success status, current trajectory, and persistent memory.
  • Bidder agent [3]. A replanning module in this agent utilises self-reflection to create new textual plans based on the auction’s status and new context information.
  • Generative agents [4]. Agents perform reflection two or three times a day, by first determining the objective of reflection according to the recent activities, then generating a reflection which will be stored in the memory stream.

Related patterns:

  • Prompt/response optimiser. Self-reflection can be applied to assess and refine the output of prompt/response optimiser.
  • Incremental model query. Self-reflection requires agents to query their incorporated foundation model multiple times for response review and evaluation.
  • Single-path plan generator. Single-path plan generator and self-reflection both contribute to self-consistency with Chain of Thought.

References:

[1] J. Huang, S. S. Gu, L. Hou, Y. Wu, X. Wang, H. Yu, and J. Han, “Large language models can self-improve,” arXiv preprint arXiv:2210.11610, 2022.

[2] N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, “Reflexion: language agents with verbal reinforcement learning,” in Advances in Neural Information Processing Systems, A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, Eds., vol. 36. Curran Associates, Inc., 2023, pp. 8634–8652. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/2023/file/1b44b878bb782e6954cd888628510e90-Paper-Conference.pdf

[3] J. Chen, S. Yuan, R. Ye, B. P. Majumder, and K. Richardson, “Put your money where your mouth is: Evaluating strategic planning and execution of llm agents in an auction arena,” arXiv preprint arXiv:2310.05746, 2023.

[4] J. S. Park, J. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” in Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, ser. UIST ’23. New York, NY, USA: Association for Computing Machinery, 2023. [Online]. Available: https://doi.org/10.1145/3586183.3606763