Proactive Goal Creator

Summary: Proactive goal creator anticipates users’ goals by understanding human interactions and capturing the context via relevant tools.

Context: Users explain the goals that the agent is expected to achieve in the prompt.

Problem: The context information collected via solely a dialogue interface may be limited, and result in inaccurate responses to users’ goals.

Forces:

  • Underspecification. i) Users may not be able to provide thorough context information and specify precise goals to agents. ii) Agents can only retrieve limited information from the memory.
  • Accessibility. Users with specified disabilities may not be able to directly interoperate with the agent via passive goal creator.

Solution: Fig. 1 illustrates a simple graphical representation of proactive goal creator. In addition to the prompts received from dialogue interface, and relevant context retrieved from memory, the proactive goal creator can anticipate users’ goals by sending requirements to detectors, which will then capture and return the user’s surroundings for further analysis and comprehension to generate the goals, for instance, identifying the user’s gestures through cameras, recognising application UI layout via screenshots, etc. The proactive goal creator should notify users about context capturing and other relevant issues with a low false positive rate, to avoid unnecessary interruptions. In addition, the captured environment information can be stored in the agent’s memory (or knowledge base) to establish “world
models” [1, 2] to improve its ability to comprehend the real world.

Figure 1. Proactive goal creator.

Benefits:

  • Interactivity. An agent can interact with users or other agents by anticipating their decisions proactively with captured multimodal context information.
  • Goal-seeking. The multimodal input can provide more detailed information for the agent to understand users’ goals, and increase the accuracy and completeness of goal achievement.
  • Accessibility. Additional tools can help capture the sentiments and other context information from disabled users, ensuring accessibility and broadening the human values of foundation model-based agents.

Drawbacks:

  • Overhead. i) Proactive goal creator is enabled by the multimodal context information captured by relevant tools, which may increase the cost of the agent. ii) Limited context information may increase the communication overhead between users and agents.

Known uses:

  • GestureGPT [3]. GestureGPT can decipher users’ hand gesture descriptions and hence comprehend users’ intents.
  • Zhao et al. [4] proposed a programming screencast analysis tool that can extract the coding steps and code snippets.
  • ProAgent [5]. ProAgent can observe the behaviours of other teammate agents, deduce their intentions, and adjust the planning accordingly.

Related patterns:

  • Passive goal creator. Proactive goal creator can be regarded an alternative of passive goal creator enabling multimodal context injection.
  • Prompt/response optimiser. Proactive goal creator can first handle users’ inputs and transfer the goals and relevant context information to prompt/response optimiser for prompt refinement.
  • Multimodal guardrails. Multimodal guardrails can help process the multimodal data captured by proactive goal creator.

References:

[1] D. Ha and J. Schmidhuber, “World models,” arXiv preprint arXiv:1803.10122, 2018.

[2] Y. LeCun, “A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27,” Open Review, vol. 62, no. 1, 2022.

[3] X. Zeng, X.Wang, T. Zhang, C. Yu, S. Zhao, and Y. Chen, “Gesturegpt: Zero-shot interactive gesture understanding and grounding with large language model agents,” arXiv preprint arXiv:2310.12821, 2023.

[4] D. Zhao, Z. Xing, X. Xia, D. Ye, X. Xu, and L. Zhu, “Seehow: Workflow extraction from programming screencasts through action-aware video analytics,” arXiv preprint arXiv:2304.14042, 2023.

[5] C. Zhang, K. Yang, S. Hu, Z. Wang, G. Li, Y. Sun, C. Zhang, Z. Zhang, A. Liu, S.-C. Zhu et al., “Proagent: Building proactive cooperative ai with large language models,” arXiv preprint arXiv:2308.11339, 2023.