LLM Agent

Source

Takeaway

  • LLM -> prompt engineering -> Agent (very hard!) to interact with environment using LLM

What is LLM Agent?

  • Agent can interact via action with environment.
  • LLM agent: Use LLM to interact with environment
  • Typical tie to robot

Question? What’s the relationship between LLM agent vs. RL agent?

Application

Key

  1. Formulation
  2. Evaluation

Formulation

Use CPU analogy to LLM

image-20231113161327612

RL Agent

image-20231113161441886

image-20231113161456904

  • Memory: because LLM is “stateless”, need memory; different from RL

    • short-term working memory
    • long-term memory: Episodic (experience); semantic (knowledge); procedural (LLM, code)
  • Action (space): RL is defined by action space.

    *External actions interact with external environments (grounding) - same as RL agent

    *Internal actions interact with internal memories - different from RL agent

    • Reasoning: read & write working memory (via LLMs)
    • Retrieval: read long-term memory
    • Learning: write long-term memory (or weights)

    image-20231113162758881

  • Decision making

    • A language agent chooses actions via decision procedures
      • Split taken actions into decision cycles

image-20231113163025656

Framework Application

image-20231113163539003

Evaluation

  1. Interact with physical world / humans: hard to scale
  2. Interact with digital simulation (Atari, Mujuco): hard to be practical
  3. Interact with digital applications: ?
    1. Interact with web (smartphone)
    2. Interact with code (PC, ..) productivity

Webshop: query, modification, and web purchase

image-20231113165023107

image-20231113165501828

Code Interaction (interact with computer)

LLM - auto-regressive way

Human code - interactive way

Lang-Chain is one example?

image-20231113165824797

image-20231113170921650

Appendix