E2E Behavior Validation for Frontend Modifications
Core Principle: Test Product Behavior, Not UI States
CRITICAL: Tests must verify that product features WORK correctly, not just that UI elements render.
What NOT to test (UI States):
- ❌ "Dropdown opens when clicked"
- ❌ "Modal appears after button click"
- ❌ "Loading spinner shows during request"
- ❌ "Form fields are visible"
- ❌ "Sidebar collapses"
What TO test (Product Behavior):
- ✅ "Selecting an LLM provider configures the agent to use that provider"
- ✅ "Creating a new agent persists it and shows in the agents list"
- ✅ "Running a tool with parameters returns the expected output"
- ✅ "Chat messages stream correctly and maintain conversation context"
- ✅ "Workflow execution triggers tools in the correct order"
Prerequisites
Requires Playwright MCP server. If the browser_navigate tool is unavailable, instruct the user to add it:
claude mcp add playwright -- npx @playwright/mcp@latest
Step 1: Understand the Feature Intent
Before writing ANY test, answer these questions:
- What user problem does this feature solve?
- What is the expected outcome when the feature works correctly?
- What data flows through the system? (user input → API → state → UI)
- What should persist after page reload?
- What downstream effects should this action have?
Document these answers as comments in your test file.
Step 2: Build and Start
pnpm build:cli cd packages/playground/e2e/kitchen-sink && pnpm dev
Verify server at http://localhost:4111
Step 3: Map Feature to Behavior Tests
Feature-to-Test Mapping Guide
| Feature Category | What to Test | Example Assertion |
|---|---|---|
| Agent Configuration | Config changes affect agent behavior | Send message → verify response uses selected model |
| LLM Provider Selection | Selected provider is used in requests | Intercept API call → verify provider in request payload |
| Tool Execution | Tool runs with correct params & returns result | Execute tool → verify output matches expected transformation |
| Workflow Execution | Steps execute in order, data flows between steps | Run workflow → verify each step's output feeds next step |
| Chat/Streaming | Messages persist, context maintained across turns | Multi-turn conversation → verify context awareness |
| MCP Server Tools | Server tools are callable and return data | Call MCP tool → verify response structure and content |
| Memory/Persistence | Data survives page reload | Create item → reload → verify item exists |
| Error Handling | Errors surface correctly to user | Trigger error condition → verify error message + recovery |
Step 4: Write Behavior-Focused Tests
Test Structure Template
import { test, expect, Page } from '@playwright/test'; import { resetStorage } from '../__utils__/reset-storage'; import { selectFixture } from '../__utils__/select-fixture'; import { nanoid } from 'nanoid'; /** * FEATURE: [Name of feature] * USER STORY: As a user, I want to [action] so that [outcome] * BEHAVIOR UNDER TEST: [Specific behavior being validated] */ test.describe('[Feature Name] - Behavior Tests', () => { let page: Page; test.beforeEach(async ({ browser }) => { const context = await browser.newContext(); page = await context.newPage(); }); test.afterEach(async () => { await resetStorage(page); }); test('should [verb describing behavior] when [trigger condition]', async () => { // ARRANGE: Set up preconditions // - Navigate to the feature // - Configure any required state // ACT: Perform the user action that triggers the behavior // ASSERT: Verify the OUTCOME, not the UI state // - Check data persistence // - Verify downstream effects // - Confirm API calls made correctly }); });
Behavior Test Patterns
Pattern 1: Configuration Affects Behavior
test('selecting LLM provider should use that provider for agent responses', async () => { // ARRANGE await page.goto('/agents/my-agent/chat'); // Intercept API to verify provider let capturedProvider: string | null = null; await page.route('**/api/chat', route => { const body = JSON.parse(route.request().postData() || '{}'); capturedProvider = body.provider; route.continue(); }); // ACT: Select a different provider await page.getByTestId('provider-selector').click(); await page.getByRole('option', { name: 'OpenAI' }).click(); // Send a message to trigger the agent await page.getByTestId('chat-input').fill('Hello'); await page.getByTestId('send-button').click(); // ASSERT: Verify the selected provider was used await expect.poll(() => capturedProvider).toBe('openai'); });
Pattern 2: Data Persistence
test('created agent should persist after page reload', async () => { // ARRANGE await page.goto('/agents'); const agentName = `Test Agent ${nanoid()}`; // ACT: Create new agent await page.getByTestId('create-agent-button').click(); await page.getByTestId('agent-name-input').fill(agentName); await page.getByTestId('save-agent-button').click(); // Wait for creation to complete await expect(page.getByText(agentName)).toBeVisible(); // ASSERT: Verify persistence await page.reload(); await expect(page.getByText(agentName)).toBeVisible({ timeout: 10000 }); });
Pattern 3: Tool Execution Produces Correct Output
test('weather tool should return formatted weather data', async () => { // ARRANGE await selectFixture(page, 'weather-success'); await page.goto('/tools/weather-tool'); // ACT: Execute tool with parameters await page.getByTestId('param-city').fill('San Francisco'); await page.getByTestId('execute-tool-button').click(); // ASSERT: Verify OUTPUT content, not just that output appears const output = page.getByTestId('tool-output'); await expect(output).toContainText('temperature'); await expect(output).toContainText('San Francisco'); // Verify structured data if applicable const outputText = await output.textContent(); const outputData = JSON.parse(outputText || '{}'); expect(outputData).toHaveProperty('temperature'); expect(outputData).toHaveProperty('conditions'); });
Pattern 4: Workflow Step Chaining
test('workflow should pass data between steps correctly', async () => { // ARRANGE await selectFixture(page, 'workflow-multi-step'); const sessionId = nanoid(); await page.goto(`/workflows/data-pipeline?session=${sessionId}`); // ACT: Trigger workflow execution await page.getByTestId('workflow-input').fill('test input data'); await page.getByTestId('run-workflow-button').click(); // ASSERT: Verify each step received correct input from previous step // Wait for completion await expect(page.getByTestId('workflow-status')).toHaveText('completed', { timeout: 30000 }); // Check step outputs show data transformation chain const step1Output = await page.getByTestId('step-1-output').textContent(); const step2Output = await page.getByTestId('step-2-output').textContent(); // Verify step 2 received step 1's output as input expect(step2Output).toContain(step1Output); });
Pattern 5: Streaming Chat with Context
test('chat should maintain conversation context across messages', async () => { // ARRANGE await selectFixture(page, 'contextual-chat'); const chatId = nanoid(); await page.goto(`/agents/assistant/chat/${chatId}`); // ACT: Multi-turn conversation await page.getByTestId('chat-input').fill('My name is Alice'); await page.getByTestId('send-button').click(); await expect(page.getByTestId('assistant-message').last()).toBeVisible({ timeout: 20000 }); await page.getByTestId('chat-input').fill('What is my name?'); await page.getByTestId('send-button').click(); // ASSERT: Verify context was maintained const response = page.getByTestId('assistant-message').last(); await expect(response).toContainText('Alice', { timeout: 20000 }); });
Pattern 6: Error Recovery
test('should show actionable error and allow retry when API fails', async () => { // ARRANGE: Set up failure fixture await selectFixture(page, 'api-failure'); await page.goto('/tools/flaky-tool'); // ACT: Trigger the error await page.getByTestId('execute-tool-button').click(); // ASSERT: Error is shown with recovery option await expect(page.getByTestId('error-message')).toContainText('failed'); await expect(page.getByTestId('retry-button')).toBeVisible(); // Switch to success fixture and retry await selectFixture(page, 'api-success'); await page.getByTestId('retry-button').click(); // Verify recovery worked await expect(page.getByTestId('tool-output')).toBeVisible({ timeout: 10000 }); await expect(page.getByTestId('error-message')).not.toBeVisible(); });
Step 5: Update Existing Tests
When a test file already exists:
- Read the existing tests to understand current coverage
- Identify if tests are UI-focused or behavior-focused
- Refactor UI-focused tests to verify behavior instead:
Refactoring Example
BEFORE (UI-focused):
test('dropdown opens when clicked', async () => { await page.getByTestId('model-dropdown').click(); await expect(page.getByRole('listbox')).toBeVisible(); });
AFTER (Behavior-focused):
test('selecting model from dropdown updates agent configuration', async () => { // Open dropdown and select model await page.getByTestId('model-dropdown').click(); await page.getByRole('option', { name: 'GPT-4' }).click(); // Verify the selection persists and affects behavior await page.reload(); await expect(page.getByTestId('model-dropdown')).toHaveText('GPT-4'); // Optionally: verify the model is used in actual requests // (via request interception or checking response metadata) });
Step 6: Kitchen-Sink Fixtures for Behavior Testing
Fixtures should represent realistic scenarios, not just mock data:
Fixture Naming Convention
<feature>-<scenario>.fixture.ts
Examples:
- agent-with-tools.fixture.ts
- chat-multi-turn-context.fixture.ts
- workflow-parallel-execution.fixture.ts
- tool-validation-error.fixture.ts
- mcp-server-timeout.fixture.ts
Fixture Content Requirements
Each fixture must define:
- Scenario description (what behavior it enables testing)
- Expected outcomes (what assertions should pass)
- Edge cases covered (error states, empty states, etc.)
// fixtures/agent-provider-switch.fixture.ts export const agentProviderSwitch = { name: 'agent-provider-switch', description: 'Tests that switching LLM providers changes agent behavior', // Mock responses for different providers responses: { openai: { content: 'Response from OpenAI', model: 'gpt-4' }, anthropic: { content: 'Response from Anthropic', model: 'claude-3' }, }, expectedBehavior: { // When provider is switched, subsequent messages use new provider providerSwitchAffectsNextMessage: true, // Provider selection persists across page reload providerPersistsOnReload: true, }, };
Step 7: Run and Validate
cd packages/playground && pnpm test:e2e
Test Quality Checklist
Before considering tests complete, verify:
- Each test has a clear user story comment
- Tests verify OUTCOMES, not intermediate UI states
- Tests would FAIL if the feature broke (not just if UI changed)
- Persistence is verified via
page.reload()where applicable - Error scenarios are covered
- Tests use appropriate timeouts for async operations
- Fixtures represent realistic usage scenarios
Quick Reference
| Step | Command/Action |
|---|---|
| Build | pnpm build:cli |
| Start | cd packages/playground/e2e/kitchen-sink && pnpm dev |
| App URL | http://localhost:4111 |
| Routes | @packages/playground/src/App.tsx |
| Run tests | cd packages/playground && pnpm test:e2e |
| Test dir | packages/playground/e2e/tests/ |
| Fixtures | packages/playground/e2e/kitchen-sink/fixtures/ |
Anti-Patterns to Avoid
| ❌ Don't | ✅ Do Instead |
|---|---|
| Test that modal opens | Test that modal action completes and persists |
| Test that button is clickable | Test that clicking button produces expected result |
| Test loading spinner appears | Test that loaded data is correct |
| Test form validation message shows | Test that invalid form cannot submit AND valid form succeeds |
| Test dropdown has options | Test that selecting option changes system behavior |
| Test sidebar navigation works | Test that navigated page has correct data/functionality |
| Assert element is visible | Assert element contains expected data/state |