Cognitive Workbench

A research framework for autonomous agents with incremental planning, persistent memory, and tool use.

Project Direction (April 2026)

The chat-mode subproject (src/chat/) is becoming the primary interface. It now provides:

ReAct tool use — process_text, web search, fetch_text (full-page extraction), respond. Each emission is a single JSON object that MUST include a thought field (one terse sentence supporting the action choice) alongside the tool-specific fields; the thought is preserved verbatim into the awareness feed (below). Per-iteration trace written before any post-turn LLM work, plus a live CLI status line ("thinking…" → "using search…") that overwrites in place so the user sees progress during long LLM calls. Prompt construction is store-and-append: the system prompt and user-message prefix are built once at loop entry and reused verbatim across iterations; only the working log grows by literal string append, so the prefix is byte-stable and the backend's KV cache hits on iter 2+. A ## Now block injects the current date/time into the system prompt at loop start. Section headings in the prompt carry brief mechanism-tags in parentheses (e.g. ## Active concerns (from YAML seeds + post-turn reflection + semantic recall; …)) so the model can read source provenance directly from the prefix and not confabulate origins.
Reasoning history (awareness feed) — after each ReAct loop (user-driven or autonomous), the per-iteration thoughts, actions, and observations persist as a Note in a rolling reasoning_history collection. The most recent few traces surface in the user-message prefix on subsequent turns, between conversation history and current input — making Jill's own prior thinking a structural input to current reasoning. Recent traces render in full; older ones in the rolling window render as a compressed action-sequence digest. Ring-bounded on disk.
Long-term memory — per-character memories collection with categorized recall (fact / preference / commitment), auto-RAG injection at turn start, and post-turn reflection that suppresses writes from hypothetical / roleplay / counterfactual frames. Discourse update + reflection run in a background single-worker executor so the response publishes without waiting on slow LLM-bound side effects.
Concerns — actionable directives Jill keeps ready to advance, stored in a per-character concerns collection separate from memories. Three categories (one_shot / durable / derived) with independent per-concern firing parameters generated by reflection: cadence_hours from a discrete allowlist {1, 2, 4, 8, 12, 24, 168} (firing rhythm), lifetime_days (decay tau), instruction (the action to take). Two timestamps anchor the lifecycle: last_engaged_at is updated only by user engagement (recall hit / recurrence promotion) and drives decay; last_acted_at is the cadence anchor, updated when the concern is acted on (autonomously or by user-driven engagement). Surfaced in the system prompt as an "Active concerns" block with a per-concern status badge — [durable, due], [durable, idle Xh/Yh], or no badge for non-fireable categories — so the model sees firing state without doing timestamp arithmetic. Recurrence-detection at write time promotes one_shot → durable on re-emission and revives satisfied concerns.
Sensors framework + Phase C autonomy — first chat-mode sensor is tick, a stateless 30 min heartbeat that publishes to sense_data. The chat loop dispatches by source name: tick events go to a Phase C autonomy path that runs _check_and_fire_concerns (cadence elapsed since last_acted_at) and autonomously executes due concerns' instructions through the standard ReAct loop (cap 2 per tick; the rest stay due for the next tick). A CLI preamble announces each fire. Autonomous turns reuse the full prompt construction (voice and trace format identical to user turns) but skip post-turn reflection and discourse update and don't refresh user-engagement timestamps. Phase B-display CLI impulses on user turns continue alongside; the cadence anchor is shared so the two paths don't double-fire.
Companion Model + Discourse tracking — single-user fair-witness texture and outstanding-discourse state, persisted across sessions.
Unified cloud-LLM config — api_key field naming an env var triggers Bearer-auth POST to any OpenAI-compatible endpoint (MIMO, OpenRouter, OpenAI, hosted vLLM, …); legacy server shortcuts still work. A dedicated server: anthropic route hits Anthropic's native Messages API (/v1/messages, x-api-key + anthropic-version headers, system as a top-level field).

The full executive-node architecture described below — continuous OODA planner, incremental planner, cognitive graph, sensors — remains operational, but future development is shifting to chat mode. The sensor framework has now landed in chat (first sensor: tick), unlocking Phase C autonomy. Cognitive graph integration of the chat trace and entity modeling are likely later moves; world model and tool model are less relevant given chat's small fixed tool set. This is a stated intention, not a present-day rewrite; the executive path is fully usable today and is what runs by default for jill-infospace*.yaml scenarios. Chat mode runs from jill-chat*.yaml scenarios via the same launcher.

What This Is

Cognitive Workbench is experimental research software for studying LLM-based cognitive architectures. It prioritizes inspectable agent behavior and fast iteration over stability.

Two coupled loops sit at the core. A continuous OODA planner maintains strategic context across cycles — choosing at each turn whether to submit a goal, update a concern, ask, say, reflect, or sleep — rather than resetting reasoning every tick. When it launches a goal, an incremental planner takes over and interleaves LLM reasoning with tool execution, generating one step at a time and adapting to real results. Every OODA event, decision, action, and outcome is recorded as a typed node in a persistent cognitive graph that serves as both long-term memory and a reflective computational trace.

User: "goal: Find recent papers on multi-agent coordination"
                    │
         ┌──────────▼──────────┐
         │  OODA Planner       │  Continuous strategic loop — context persists
         │  (ooda_planner.py)  │  across cycles. Actions: submit-goal,
         │                     │  update-concern, say, ask, reflect, sleep, ...
         │  ┌───────────────┐  │  Event-action history with progressive rollup
         │  │ Observe/Orient│  │
         │  │ Decide → Act  │  │  Writes every stage into the cognitive graph
         │  └───────────────┘  │
         └──────────┬──────────┘
                    │ submit-goal
         ┌──────────▼──────────┐
         │ Incremental Planner │  Stage 0: Retrieve context (FAISS + graph)
         │                     │  Stage 1: Analyze + select tools
         │  ┌───────────────┐  │  Stage 2: Generate code → Execute → Evaluate
         │  │ Reason → Act  │──│──────► repeat until done
         │  │ ← Observe     │  │
         │  └───────────────┘  │  Reflect: learn from execution trace
         └──────────┬──────────┘
                    │
         ┌──────────▼──────────┐
         │ Infospace Executor   │  Primitives + Tools
         │                     │  Notes + Collections + Relations
         │  search-web, say,   │  FAISS semantic search
         │  create-note, ...   │  Persistent memory
         └─────────────────────┘

         ┌─────────────────────┐
         │  Cognitive Graph    │  Typed nodes (event, assessment, decision,
         │  (cognitive_graph.py)│ goal_launch/outcome, concern_change, ...)
         │                     │  FAISS-backed semantic search + BFS subgraph
         │                     │  expansion; idle-time consolidation
         └─────────────────────┘

Key Features

Continuous OODA Planner — single strategic loop that persists context across cycles; chooses each turn from nine actions (submit-goal, update-concern, say, ask, configure-sensor, reflect, wait, sleep, update-user-model). Event-action history is kept with progressive rollup so the LLM sees recent cycles in full and older strategic intent in summary.
Incremental Planning — once a goal is submitted, the inner planner interleaves LLM reasoning with tool execution, adapting its approach based on real results
Cognitive Graph & Reflective Trace — a persistent, FAISS-backed graph of typed nodes (events, assessments, decisions, action results, goal launches/outcomes, concern changes, triage nominations, conversation turns) and directed edges recording the full OODA provenance. Supports semantic search, BFS subgraph expansion, and idle-time consolidation that summarizes old windows into consolidation nodes (explorer guide)
Concerns & Derived Concerns — user concerns and agent-derived concerns with a recurring lifecycle: active → satisfied (with per-concern revisit timer) → back to active when the timer expires; abandoned is the only terminal state. Homeostatic time-pressure keeps seeded concerns alive; the OODA planner's update-concern action adjusts weight/status/notes directly, and an LLM-activation path lets derived-concern reasoning nominate without waiting for an activation threshold
Goal Scheduling — submit goals with goal: prefix; schedule them for manual, automatic, recurring, or daily-at-time execution
Envisioning & Quality Control — lightweight LLM framing for coherent dialog; post-execution reflection for failure recovery and learning
Infospace Memory — Notes, Collections, and Relations as structured working memory with FAISS semantic search + entity-augmented retrieval
Theory of Mind + Companion Model — two complementary lenses on peers. ToM (all peers) tracks trust, competence, reliability, transparency, goals, and emotional state. The Companion Model (user only) adds a friendship lens: current chapter, state of mind, what matters, thinking style, what's on their mind, and how to be useful right now. Fast-moving sections update per conversation; slow-moving ones only on new evidence
World Model — Bayesian cross-goal knowledge with recency-weighted evidence decay and staleness detection
Extensible Tools — 24 built-in tools (web search, email, Bluesky, academic papers, shell scripts) plus world-specific integrations
Sensors — autonomous data collectors (browser visit tracking, RSS feeds) that feed real-world context to the agent
Web UI — real-time activation field visualization, chat, goal management, resource browser, and task/concern manager
World Integrations — optional worlds (Minecraft, file system, desktop automation, ScienceWorld) with specialized tools

Quick Start

1. Install

git clone https://github.com/bdambrosio/Cognitive_workbench.git
cd Cognitive_workbench
python3 -m venv zenoh_venv
source zenoh_venv/bin/activate
pip install -r requirements.txt

2. Configure an LLM backend

Option A — Local GPU (SGLang):

Edit scenarios/jill-infospace.yaml and set sgl_model_path to your preferred model. SGLang can be finicky, sorry, but use of @function makes reasoning loop so much faster.
Or scenarios/jill-infospace-vllm.yaml and set vllm_model_path to your preferred model.

Option B — Cloud API (no GPU needed):

export OPENROUTER_API_KEY="sk-or-v1-..."   # from openrouter.ai

Alt Model for semantic processing:
Some tools, like refine, extract-struct, filter-semantic, assess, perform complex semantic processing of text (e.g. extracting field from json). If your basic llm isn't up to the task, you can provide a heavier weight model for these to use:

alt_llm_config:
  openrouter_model_path: "qwen/qwen3-235b-a22b-2507"

3. Run

source zenoh_venv/bin/activate
cd src

python3 launcher.py ../scenarios/jill-infospace.yaml --cli --resource-browser
# Or with the web UI:
python3 launcher.py ../scenarios/jill-infospace.yaml --ui --resource-browser
# Or for OpenRouter:
python3 launcher.py ../scenarios/jill-infospace-openrouter.yaml --ui --resource-browser

4. Optional: Browser automation

The browse tool requires the agent-browser CLI (Rust binary, not a Python package):

cargo install agent-browser        # if you have Rust/cargo
# or download a prebuilt binary from https://github.com/vercel-labs/agent-browser/releases

Skip this if you don't need browser automation — all other tools work without it.

Open http://localhost:3000 and submit a goal via the + Goal button:

Find and summarize recent papers on transformer architectures

See Getting Started for full setup details, environment variables, and troubleshooting.

Web UI

The system provides three web-facing components plus an optional browser extension. See the UI Guide for full details.

Activation Field (port 3000)

The default view is an interactive D3 force-directed graph centered on the agent. Nodes represent the agent, its goals, concerns, notes, and variable bindings — sized and colored by activation level. Click any node to inspect it in the side panel.

The bottom dock bar provides controls for chat, goal entry, execution control (stop, continuous, LLM toggle), and links to the other UI components.

An OODA pulse overlay shows the agent's cognitive cycle in real time — expanding colored rings indicate Observe (blue), Orient (yellow), Decide (orange), and Act (green) phases.

Classic UI (port 3000/classic)

A text-oriented alternative with a scrollable action log, character sidebar with tabs (Plan, Bindings, Goals, Plans, State, Schedule, Tasks), and direct text input for goals and chat.

Resource Browser (port 3001)

Browse, view, edit, and delete Notes, Collections, and Concerns — the agent's working memory. Two-panel layout with a resource list and content viewer. The Concerns tab shows user and derived concerns with activation, weight, revisit interval, and status/delete actions (replaces the previous standalone Task Manager).

Browser Extension (optional)

A Chrome extension that captures page visits and feeds them to the agent via the browser-visits sensor. Install by loading the browser_extension/ directory as an unpacked extension.

How It Works (In Brief)

You type a message: the unified chat handler decides whether to respond conversationally, escalate to a goal (tool use needed), or dispatch a system command — all in a single LLM call
The OODA planner runs continuously: on strategic events (timer ticks, sensor input, concern activation, inform) it assembles live context — concerns, goals, sensors, cognitive-graph slices, character/capabilities — and emits one JSON action per cycle. Chat and alerts take a fast path but receive a summary of ongoing OODA activity for awareness
Goal execution: when the planner picks submit-goal, the Executive Node hands off to the Incremental Planner, which retrieves context (FAISS + entity-augmented + cognitive-graph subgraph), then loops:
- LLM writes a code block calling tools (search-web, stock-price, create-note, etc.)
- Executor runs it, returns structured results
- LLM evaluates: done? next step? error recovery?
Reflection analyzes the full execution trace — updates world model (recency-weighted Bayesian facts), tool insights, and cross-goal learnings. Every OODA stage and goal lifecycle transition is also recorded as typed nodes and edges in the cognitive graph, forming a reflective computational trace the agent can query later
Named entities are extracted from user input, goals, and persistent notes — feeding the cognitive graph with entity nodes and mentions edges that improve retrieval over time
Concerns evolve with a recurring lifecycle: active → satisfied (for the concern's revisit interval) → active again. Seeded concerns accrue homeostatic time-pressure so they re-surface on their own; the planner can directly update-concern rather than waiting for activation triage
Theory of Mind and Companion Model are updated when conversations are archived (/done, /next, /bye). ToM covers every peer (trust, competence, goals, emotional state); the Companion Model runs only for the user and captures the "how are they right now" picture that shapes engagement style
Scheduled goals can repeat daily at a set time, or auto-proceed through multi-step workflows
Sensors (browser visits, RSS feeds) run on timers and feed real-world context back into the agent's concern model

Available Scenarios

Scenario	Mode	World	Backend
`jill-chat.yaml`	Chat (primary)	Chat-only world	OpenAI-compatible local server
`jill-chat-vllm.yaml`	Chat	Chat-only world	vLLM (local GPU)
`jill-chat-mimo.yaml`	Chat	Chat-only world	MIMO cloud (unified `api_key` form)
`jill-chat-sonnet.yaml`	Chat	Chat-only world	Anthropic Claude Sonnet 4.6 (native Messages API)
`jill-infospace.yaml`	Executive (legacy)	Core infospace	SGLang (local GPU)
`jill-infospace-openrouter.yaml`	Executive (legacy)	Core infospace	OpenRouter (cloud)
`jill-infospace-anthropic.yaml`	Executive (legacy)	Core infospace	Anthropic Claude
`jill-infospace-openai.yaml`	Executive (legacy)	Core infospace	OpenAI
`jill-infospace-vllm.yaml`	Executive (legacy)	Core infospace	vLLM (local GPU)
`jill-fs.yaml`	Executive	File system	SGLang
`jill-fs-openrouter.yaml`	Executive	File system	OpenRouter (cloud)
`jill-minecraft.yaml`	Executive	Minecraft 3D world	SGLang
`jill-osworld.yaml`	Executive	Desktop automation	SGLang
`jill-scienceworld.yaml`	Executive	Science simulation	SGLang
`jack-and-jill.yaml`	Executive	Multi-agent	SGLang

See Configuration for details on each.

Repository Structure

Cognitive_workbench/
├── README.md                          # This file
├── BACKGROUND.md                      # Research philosophy
├── requirements.txt                   # Python dependencies
├── docs/                              # Detailed documentation
├── scenarios/                         # Scenario YAML files + runtime data
├── browser_extension/                 # Chrome extension for page visit tracking
└── src/
    ├── launcher.py                    # Entry point — dispatches by scenario `mode`
    ├── chat/                          # Chat-mode subproject (becoming primary)
    │   └── chat_loop.py               #   ReAct loop + status line, memories + concerns collections,
    │                                  #   reflection (frame-aware), recurrence promotion, concern firing,
    │                                  #   fetch_text, unified cloud LLM, background post-turn executor
    ├── executive_node.py              # Main tick coordinator, fast-path chat, goal lifecycle (legacy)
    ├── ooda_planner.py                # Continuous OODA planner (legacy)
    ├── incremental_planner.py         # Inner goal planner (legacy)
    ├── infospace_executor.py           # Primitives + tool execution
    ├── infospace_resource_manager.py   # Notes/Collections/Relations + FAISS (shared by chat and executive)
    ├── entity_index.py                # NER extraction, entity index, graph integration
    ├── cognitive_graph.py             # Typed event graph — reflective computational trace
    ├── conversation_store.py          # Dialog lifecycle, archival, session backfill
    ├── discourse.py                   # Theory of Mind + Companion Model templates
    ├── world_model.py                 # Bayesian recency-weighted knowledge
    ├── fastapi_action_display.py      # Web UI (Activation Field + Classic)
    ├── resource_browser.py            # Resource Browser UI
    ├── goal_scheduler.py              # Autonomous goal scheduling
    ├── concern_triage.py              # Concern nomination paths (activation, orient, LLM)
    ├── user_concern_model.py          # User concerns with recurring lifecycle
    ├── derived_concern_model.py       # Agent-derived concerns + revisit timers
    ├── sensor_runner.py               # Sensor scheduling and execution
    ├── sensors/                       # Sensor implementations
    │   ├── browser-visits/            # Browser page visit sensor
    │   └── rss-watcher/               # RSS feed monitor
    ├── tools/                         # Core tools (search-web, run-script, etc.)
    ├── world-tools/                   # World-specific tools (minecraft, fs, etc.)
    ├── static/ui/                     # Activation Field frontend (HTML/JS/CSS)
    ├── scripts/                       # Shell scripts for run-script tool
    └── utils/                         # Shared utilities

Documentation

Document	Description
Getting Started	Installation, credentials, LLM backend setup, first run
Architecture	Core cognitive architecture — incremental planner, OODA loop, infospace memory
OODA as Incremental Planner	Continuous strategic planner: action schemas, context assembly, event-action history (decisions log)
Cognitive Graph Spec	Typed nodes/edges, FAISS semantic index, consolidation, reflective trace (explorer)
Concerns Architecture	User + derived concerns, revisit lifecycle, homeostatic pressure, triage paths (user concern model)
UI Guide	Activation Field, Classic UI, Resource Browser (with Concerns tab), sensors
Goals & Scheduling	Goal submission (`goal:` prefix), scheduled goals, daily-at-time, autonomous execution
Envisioning & QC	Conversational envisioning, reflection, failure recovery, missing affordance monitoring
Tools & Primitives	Infospace primitives, tool catalog, run-script, plan tools
Configuration	Scenario YAML reference, available scenarios, directory structure
Tool Development	Creating new tools (`Skill.md` + `tool.py`)
Background	Research motivation and philosophy
Contributor Guidelines	Code style, testing, commit conventions

Contributing

See src/AGENTS.md for repository guidelines, code style, and commit conventions.

License

MIT License — see LICENSE.

mc-craft