npx skills add https://github.com/bdambrosio/Cognitive_workbench --skill mc-craftCLI를 사용하여 이 스킬을 설치하고 작업 공간에서 SKILL.md 워크플로 사용을 시작하세요.
A research framework for autonomous agents with incremental planning, persistent memory, and tool use.
The chat-mode subproject (src/chat/) is becoming the primary interface. It now provides:
process_text, web search, fetch_text (full-page extraction), respond. Each emission is a single JSON object that MUST include a thought field (one terse sentence supporting the action choice) alongside the tool-specific fields; the thought is preserved verbatim into the awareness feed (below). Per-iteration trace written before any post-turn LLM work, plus a live CLI status line ("thinking…" → "using search…") that overwrites in place so the user sees progress during long LLM calls. Prompt construction is store-and-append: the system prompt and user-message prefix are built once at loop entry and reused verbatim across iterations; only the working log grows by literal string append, so the prefix is byte-stable and the backend's KV cache hits on iter 2+. A ## Now block injects the current date/time into the system prompt at loop start. Section headings in the prompt carry brief mechanism-tags in parentheses (e.g. ## Active concerns (from YAML seeds + post-turn reflection + semantic recall; …)) so the model can read source provenance directly from the prefix and not confabulate origins.reasoning_history collection. The most recent few traces surface in the user-message prefix on subsequent turns, between conversation history and current input — making Jill's own prior thinking a structural input to current reasoning. Recent traces render in full; older ones in the rolling window render as a compressed action-sequence digest. Ring-bounded on disk.memories collection with categorized recall (fact / preference / commitment), auto-RAG injection at turn start, and post-turn reflection that suppresses writes from hypothetical / roleplay / counterfactual frames. Discourse update + reflection run in a background single-worker executor so the response publishes without waiting on slow LLM-bound side effects.concerns collection separate from memories. Three categories (one_shot / durable / derived) with independent per-concern firing parameters generated by reflection: cadence_hours from a discrete allowlist {1, 2, 4, 8, 12, 24, 168} (firing rhythm), lifetime_days (decay tau), instruction (the action to take). Two timestamps anchor the lifecycle: last_engaged_at is updated only by user engagement (recall hit / recurrence promotion) and drives decay; last_acted_at is the cadence anchor, updated when the concern is acted on (autonomously or by user-driven engagement). Surfaced in the system prompt as an "Active concerns" block with a per-concern status badge — [durable, due], [durable, idle Xh/Yh], or no badge for non-fireable categories — so the model sees firing state without doing timestamp arithmetic. Recurrence-detection at write time promotes one_shot → durable on re-emission and revives satisfied concerns.tick, a stateless 30 min heartbeat that publishes to sense_data. The chat loop dispatches by source name: tick events go to a Phase C autonomy path that runs _check_and_fire_concerns (cadence elapsed since last_acted_at) and autonomously executes due concerns' instructions through the standard ReAct loop (cap 2 per tick; the rest stay due for the next tick). A CLI preamble announces each fire. Autonomous turns reuse the full prompt construction (voice and trace format identical to user turns) but skip post-turn reflection and discourse update and don't refresh user-engagement timestamps. Phase B-display CLI impulses on user turns continue alongside; the cadence anchor is shared so the two paths don't double-fire.api_key field naming an env var triggers Bearer-auth POST to any OpenAI-compatible endpoint (MIMO, OpenRouter, OpenAI, hosted vLLM, …); legacy server shortcuts still work. A dedicated server: anthropic route hits Anthropic's native Messages API (/v1/messages, x-api-key + anthropic-version headers, system as a top-level field).The full executive-node architecture described below — continuous OODA planner, incremental planner, cognitive graph, sensors — remains operational, but future development is shifting to chat mode. The sensor framework has now landed in chat (first sensor: tick), unlocking Phase C autonomy. Cognitive graph integration of the chat trace and entity modeling are likely later moves; world model and tool model are less relevant given chat's small fixed tool set. This is a stated intention, not a present-day rewrite; the executive path is fully usable today and is what runs by default for jill-infospace*.yaml scenarios. Chat mode runs from jill-chat*.yaml scenarios via the same launcher.
Cognitive Workbench is experimental research software for studying LLM-based cognitive architectures. It prioritizes inspectable agent behavior and fast iteration over stability.
Two coupled loops sit at the core. A continuous OODA planner maintains strategic context across cycles — choosing at each turn whether to submit a goal, update a concern, ask, say, reflect, or sleep — rather than resetting reasoning every tick. When it launches a goal, an incremental planner takes over and interleaves LLM reasoning with tool execution, generating one step at a time and adapting to real results. Every OODA event, decision, action, and outcome is recorded as a typed node in a persistent cognitive graph that serves as both long-term memory and a reflective computational trace.
User: "goal: Find recent papers on multi-agent coordination"
│
┌──────────▼──────────┐
│ OODA Planner │ Continuous strategic loop — context persists
│ (ooda_planner.py) │ across cycles. Actions: submit-goal,
│ │ update-concern, say, ask, reflect, sleep, ...
│ ┌───────────────┐ │ Event-action history with progressive rollup
│ │ Observe/Orient│ │
│ │ Decide → Act │ │ Writes every stage into the cognitive graph
│ └───────────────┘ │
└──────────┬──────────┘
│ submit-goal
┌──────────▼──────────┐
│ Incremental Planner │ Stage 0: Retrieve context (FAISS + graph)
│ │ Stage 1: Analyze + select tools
│ ┌───────────────┐ │ Stage 2: Generate code → Execute → Evaluate
│ │ Reason → Act │──│──────► repeat until done
│ │ ← Observe │ │
│ └───────────────┘ │ Reflect: learn from execution trace
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Infospace Executor │ Primitives + Tools
│ │ Notes + Collections + Relations
│ search-web, say, │ FAISS semantic search
│ create-note, ... │ Persistent memory
└─────────────────────┘
┌─────────────────────┐
│ Cognitive Graph │ Typed nodes (event, assessment, decision,
│ (cognitive_graph.py)│ goal_launch/outcome, concern_change, ...)
│ │ FAISS-backed semantic search + BFS subgraph
│ │ expansion; idle-time consolidation
└─────────────────────┘
consolidation nodes (explorer guide)active → satisfied (with per-concern revisit timer) → back to active when the timer expires; abandoned is the only terminal state. Homeostatic time-pressure keeps seeded concerns alive; the OODA planner's update-concern action adjusts weight/status/notes directly, and an LLM-activation path lets derived-concern reasoning nominate without waiting for an activation thresholdgoal: prefix; schedule them for manual, automatic, recurring, or daily-at-time executiongit clone https://github.com/bdambrosio/Cognitive_workbench.git
cd Cognitive_workbench
python3 -m venv zenoh_venv
source zenoh_venv/bin/activate
pip install -r requirements.txt
Option A — Local GPU (SGLang):
scenarios/jill-infospace.yaml and set sgl_model_path to your preferred model. SGLang can be finicky, sorry, but use of @function makes reasoning loop so much faster.scenarios/jill-infospace-vllm.yaml and set vllm_model_path to your preferred model.Option B — Cloud API (no GPU needed):
export OPENROUTER_API_KEY="sk-or-v1-..." # from openrouter.ai
Alt Model for semantic processing:
Some tools, like refine, extract-struct, filter-semantic, assess, perform complex semantic processing of text (e.g. extracting field from json). If your basic llm isn't up to the task, you can provide a heavier weight model for these to use:
alt_llm_config:
openrouter_model_path: "qwen/qwen3-235b-a22b-2507"
source zenoh_venv/bin/activate
cd src
python3 launcher.py ../scenarios/jill-infospace.yaml --cli --resource-browser
# Or with the web UI:
python3 launcher.py ../scenarios/jill-infospace.yaml --ui --resource-browser
# Or for OpenRouter:
python3 launcher.py ../scenarios/jill-infospace-openrouter.yaml --ui --resource-browser
The browse tool requires the agent-browser CLI (Rust binary, not a Python package):
cargo install agent-browser # if you have Rust/cargo
# or download a prebuilt binary from https://github.com/vercel-labs/agent-browser/releases
Skip this if you don't need browser automation — all other tools work without it.
Open http://localhost:3000 and submit a goal via the + Goal button:
Find and summarize recent papers on transformer architectures
See Getting Started for full setup details, environment variables, and troubleshooting.
The system provides three web-facing components plus an optional browser extension. See the UI Guide for full details.
The default view is an interactive D3 force-directed graph centered on the agent. Nodes represent the agent, its goals, concerns, notes, and variable bindings — sized and colored by activation level. Click any node to inspect it in the side panel.
The bottom dock bar provides controls for chat, goal entry, execution control (stop, continuous, LLM toggle), and links to the other UI components.
An OODA pulse overlay shows the agent's cognitive cycle in real time — expanding colored rings indicate Observe (blue), Orient (yellow), Decide (orange), and Act (green) phases.
A text-oriented alternative with a scrollable action log, character sidebar with tabs (Plan, Bindings, Goals, Plans, State, Schedule, Tasks), and direct text input for goals and chat.
Browse, view, edit, and delete Notes, Collections, and Concerns — the agent's working memory. Two-panel layout with a resource list and content viewer. The Concerns tab shows user and derived concerns with activation, weight, revisit interval, and status/delete actions (replaces the previous standalone Task Manager).
A Chrome extension that captures page visits and feeds them to the agent via the browser-visits sensor. Install by loading the browser_extension/ directory as an unpacked extension.
inform) it assembles live context — concerns, goals, sensors, cognitive-graph slices, character/capabilities — and emits one JSON action per cycle. Chat and alerts take a fast path but receive a summary of ongoing OODA activity for awarenesssubmit-goal, the Executive Node hands off to the Incremental Planner, which retrieves context (FAISS + entity-augmented + cognitive-graph subgraph), then loops:
search-web, stock-price, create-note, etc.)mentions edges that improve retrieval over timeupdate-concern rather than waiting for activation triage/done, /next, /bye). ToM covers every peer (trust, competence, goals, emotional state); the Companion Model runs only for the user and captures the "how are they right now" picture that shapes engagement style| Scenario | Mode | World | Backend |
|---|---|---|---|
jill-chat.yaml |
Chat (primary) | Chat-only world | OpenAI-compatible local server |
jill-chat-vllm.yaml |
Chat | Chat-only world | vLLM (local GPU) |
jill-chat-mimo.yaml |
Chat | Chat-only world | MIMO cloud (unified api_key form) |
jill-chat-sonnet.yaml |
Chat | Chat-only world | Anthropic Claude Sonnet 4.6 (native Messages API) |
jill-infospace.yaml |
Executive (legacy) | Core infospace | SGLang (local GPU) |
jill-infospace-openrouter.yaml |
Executive (legacy) | Core infospace | OpenRouter (cloud) |
jill-infospace-anthropic.yaml |
Executive (legacy) | Core infospace | Anthropic Claude |
jill-infospace-openai.yaml |
Executive (legacy) | Core infospace | OpenAI |
jill-infospace-vllm.yaml |
Executive (legacy) | Core infospace | vLLM (local GPU) |
jill-fs.yaml |
Executive | File system | SGLang |
jill-fs-openrouter.yaml |
Executive | File system | OpenRouter (cloud) |
jill-minecraft.yaml |
Executive | Minecraft 3D world | SGLang |
jill-osworld.yaml |
Executive | Desktop automation | SGLang |
jill-scienceworld.yaml |
Executive | Science simulation | SGLang |
jack-and-jill.yaml |
Executive | Multi-agent | SGLang |
See Configuration for details on each.
Cognitive_workbench/
├── README.md # This file
├── BACKGROUND.md # Research philosophy
├── requirements.txt # Python dependencies
├── docs/ # Detailed documentation
├── scenarios/ # Scenario YAML files + runtime data
├── browser_extension/ # Chrome extension for page visit tracking
└── src/
├── launcher.py # Entry point — dispatches by scenario `mode`
├── chat/ # Chat-mode subproject (becoming primary)
│ └── chat_loop.py # ReAct loop + status line, memories + concerns collections,
│ # reflection (frame-aware), recurrence promotion, concern firing,
│ # fetch_text, unified cloud LLM, background post-turn executor
├── executive_node.py # Main tick coordinator, fast-path chat, goal lifecycle (legacy)
├── ooda_planner.py # Continuous OODA planner (legacy)
├── incremental_planner.py # Inner goal planner (legacy)
├── infospace_executor.py # Primitives + tool execution
├── infospace_resource_manager.py # Notes/Collections/Relations + FAISS (shared by chat and executive)
├── entity_index.py # NER extraction, entity index, graph integration
├── cognitive_graph.py # Typed event graph — reflective computational trace
├── conversation_store.py # Dialog lifecycle, archival, session backfill
├── discourse.py # Theory of Mind + Companion Model templates
├── world_model.py # Bayesian recency-weighted knowledge
├── fastapi_action_display.py # Web UI (Activation Field + Classic)
├── resource_browser.py # Resource Browser UI
├── goal_scheduler.py # Autonomous goal scheduling
├── concern_triage.py # Concern nomination paths (activation, orient, LLM)
├── user_concern_model.py # User concerns with recurring lifecycle
├── derived_concern_model.py # Agent-derived concerns + revisit timers
├── sensor_runner.py # Sensor scheduling and execution
├── sensors/ # Sensor implementations
│ ├── browser-visits/ # Browser page visit sensor
│ └── rss-watcher/ # RSS feed monitor
├── tools/ # Core tools (search-web, run-script, etc.)
├── world-tools/ # World-specific tools (minecraft, fs, etc.)
├── static/ui/ # Activation Field frontend (HTML/JS/CSS)
├── scripts/ # Shell scripts for run-script tool
└── utils/ # Shared utilities
| Document | Description |
|---|---|
| Getting Started | Installation, credentials, LLM backend setup, first run |
| Architecture | Core cognitive architecture — incremental planner, OODA loop, infospace memory |
| OODA as Incremental Planner | Continuous strategic planner: action schemas, context assembly, event-action history (decisions log) |
| Cognitive Graph Spec | Typed nodes/edges, FAISS semantic index, consolidation, reflective trace (explorer) |
| Concerns Architecture | User + derived concerns, revisit lifecycle, homeostatic pressure, triage paths (user concern model) |
| UI Guide | Activation Field, Classic UI, Resource Browser (with Concerns tab), sensors |
| Goals & Scheduling | Goal submission (goal: prefix), scheduled goals, daily-at-time, autonomous execution |
| Envisioning & QC | Conversational envisioning, reflection, failure recovery, missing affordance monitoring |
| Tools & Primitives | Infospace primitives, tool catalog, run-script, plan tools |
| Configuration | Scenario YAML reference, available scenarios, directory structure |
| Tool Development | Creating new tools (Skill.md + tool.py) |
| Background | Research motivation and philosophy |
| Contributor Guidelines | Code style, testing, commit conventions |
See src/AGENTS.md for repository guidelines, code style, and commit conventions.
MIT License — see LICENSE.