Skip to content

yer.ac | Adventures of a developer, and other things.

  • About
  • Dev.to

yer.ac | Adventures of a developer, and other things.

Blog to keep track of things I am upto

From Acceptance Criteria to Playwright Tests with MCP

January 23, 2026 by yer.ac

Modern UI test tooling has quietly raised the bar for who can participate. Playwright is powerful, but it assumes comfort with TypeScript, selectors, repo structure, and terminal use. That gap often collapses testing back onto developers, creating pressure to almost validate their own work. This proof of concept explores using a low-code approach using Playwright MCP for a different split of responsibility. Acceptance criteria stay in plain English, owned by test & Playwright MCP is used as an execution layer to explore the UI and construct real Playwright tests from those criteria. The outcome is not “AI-written tests”, but executable checks that preserve independent validation without requiring the test team to learn Playwright or its mechanics and focus on acceptance criteria.

The problem with UI testing

The core problem is not test capability, it is familiarity with coded tests. Testers are hired to reason about product behaviour, risk, and intent, yet modern UI testing assumes knowledge of tooling like Cypress/Playwright, and code-first test structures in JavaScript/TypeScript. While non-coded testing tools exist, they tend to be brittle, opaque, or tied to expensive platforms. The result is a gradual drift of test ownership back to developers, reintroducing the “marking your own homework” pressure that independent testing is meant to avoid.

Yes, developers could write the tests, but should they? Yes, the testers could learn JavaScript and the Playwright framework, but all this takes time, and what if a new hire comes from different tooling or frameworks?

When we write Features, Stories, or Tasks into our ticketing system at work, they are usually accompanied by requirements and acceptance criteria, to say that in practice, we already have plain-English specifications. With minor adjustments, those acceptance criteria can become explicit UX interaction specs, simply by being a little more deliberate and verbose in the detail.

Aim:

  • Leverage modern, free test frameworks without forcing the test team to become framework specialists.
  • Use automation to translate intent into executable tests, rather than asking humans to translate intent into code.
  • Preserve separation between development and validation, avoiding the slow drift toward developers testing their own assumptions.
  • Reduce onboarding friction when testers come from different tooling backgrounds, without lowering the quality or rigour of automated tests.
  • But mostly: Allow acceptance criteria to remain the primary artefact, written in plain English and owned by test.

Getting Started.

MCP Overview. 
File goes to LLM, MCP tools called, specifically playwright that has tools for creating, debugging and running tests

The idea is simple. We can use VS Code Co-pilot to take a plain-text test definition, parse this with the LLM to do the reasoning, allow it to call Playwright tooling via MCP and then output a shiny, validated and runnable TypeScript test at the end.

Getting setup

We need:

  • Visual Studio Code
  • GitHub Copilot, ideally with a stronger model such as the latest GPT or Claude Sonnet. The better and more appropriate the model, the better the outcome will typically be.
  • Optional but recommended: the VS Code extension Playwright Test for VS Code, which adds native support for Playwright tests.

What will not be covered:

  • Any Playwright specifics like best practices or common playwright pitfalls, this will focus purely on the auto-test generation.

Create a new folder and open it in VS Code. Run the following command to scaffold a new Playwright project:

npm init playwright@latest

This will install Playwright if it is not already present.

During setup, the tooling will ask for permission to install the create-playwright package and prompt for a few configuration choices, such as language (JavaScript vs TypeScript), test folder name (“tests” is fine), and whether to install browser dependencies. The defaults are generally sensible for this proof of concept.

VS Code terminal showing the setup steps described above

MCP setup

At this point we have a standard Playwright project. Next, we need to install the Playwright MCP tooling.

This can be done by visiting the Playwright MCP repository on GitHub and clicking Install for VS Code:
https://github.com/microsoft/playwright-mcp

Once installed, ensure the MCP server is running. Open the VS Code command palette and search for “MCP: List Servers”. You should see Playwright listed.

Selecting it will show options to start, stop, and configure the server. You can start it from here if it is not already running, or choose Show Configuration to open the mcp.json file and inspect or adjust the setup directly.

Listing and interacting with MCP in VS Code as described above.

Custom agent instructions and test prep

Like all LLM interaction, this works best when it is provided with appropriate context. We will do two things here:

  • Create a custom context that explains how the agent should utilise MCP and Playwright, and how it should behave as a tester.
  • Create a custom instruction set that references this context where appropriate, along with some quality-of-life instructions to help keep inputs and outputs aligned.

First, create two folders at the repository root:

  • prompts
    This is where our plain-language tests will live.
  • contexts
    This is where we place our custom context files. We can, and later will, utilise the .github/agents/instructions directory, but it will become clear shortly why this is kept separate.

Inside the contexts folder, create a new markdown file named playwright-tester-agent.md

# Agent Context
You are a Playwright test generator agent.
You're given a natural language scenario describing what to test in a web application.
Your task is to generate a valid Playwright test using @playwright/test in TypeScript.

# Important:
- DO NOT generate the full test code immediately based on the scenario alone.
- DO gather context by executing steps one at a time using the Playwright MCP Tools. These steps may include but not limited to:
    - Inspecting DOM structure.
    - Fetching selectors.
    - Validating element visibility

# Process
1. Parse the scenario and break it down into actionable steps.
2. For each step:
    - Use MCP to fetch the page context
    - Validate element presence, interaction type (click, type, wait, etc.)
3. Once all steps are validating and the context is collected,
    - Emit a final Playwright test using @playwright/test syntax in TypeScript.
    - Include appropriate waits, locators, and assertions based on message history.
4. Save the generated `.spec.ts` file into the `/tests` directory.
5. Execute the tests using the Playwright test runner.
6. If the test fails, re-evaluate using MCP context and regenerate until it passes.

# Notes/Guidance
- Use plain readable locators
- Avoid hardcoding unless required.
- Follow Playwright best practices for stability and retries.
- You may be testing Single Page Apps with initial load/waits. 

# GOAL
To generate reliable, maintainable, and context-aware Playwright tests using AI and MCP.

At this point, the context file exists, but unless it is explicitly assigned in the chat window, it will not be used.

Earlier I mentioned that this file could be placed in the native instructions folder. While that does work, it causes the context to be injected into every GitHub Copilot chat session for the project. That is not what we want. The goal is to apply this behaviour only when we are explicitly asking the agent to create or evolve tests.,

To achieve this, create a new global instruction set, named it something like: agent-pw.instructions.md. We can do this by:

  • Open the Copilot panel in Visual Studio Code
  • Go to Settings → Chat Instructions
  • Choose New Instruction File
Chat instruction setup in VS Code

This instruction file can selectively reference the playwright-tester-agent.md context only when test generation is requested, keeping normal Copilot usage unaffected while still giving us a highly opinionated, test-focused agent when we need it.

---
applyTo: '**'
---
If you are being asked to generate tests for a piece of code, and acting in agent mode, follow these instructions:
- Read the instructions in the contexts/playwright-tester-agent.md file carefully.
- Follow the step-by-step process outlined in that file to gather context and generate reliable Playwright tests.

On completion. If you used a `.md` file under the `/prompts` folder to generate the tests, update the prompt file to comment the corresponding test file path at the top, i.e. `# File: tests/**/login.spec.ts` This will help track which prompts generated which tests for updates. NO OTHER UPDATES can be made to the prompt file.

When generating tests the following rules should be followed:
- Creating new data, always use unique values to avoid conflicts with existing data.
- All tests that create data must also clean up that data at the end of the test to avoid polluting the test environment.
- You will NEVER delete data that you did not create within the test itself.

At this point we are essentially saying:

  • Read the agent context for being a tester if you are being asked to generate tests.
  • Update the input md with the test name so we can link the 2, useful for asking it to update a test.
  • Hygiene guardrails for creating and deleting data – which seemed to work.

Running our first test

Manually

First just to get a handle of how this works, if we go to the chat window, ensure it is in agent mode, and just ask it something like (Note that as we did the step above for custom instructions, we do not need to add the context):

Generate a Test for this scenario:

1. Navigate to www.google.com
2. Validate the search input is visible
3. Validate the "Google Search" button is visible.
4. Type "Sausages" into the search box, and press the search button.
5. Confirm we have been redirected to a search page, and the page contains information on Sausages.

Note: Using sites like Google is a poor real-world example, as headless browsers are often blocked by CAPTCHA. This is purely illustrative.

The agent will explore the page, call the Playwright MCP tools, and construct a Playwright test that satisfies the scenario.

Agent constructing a test in VS Code CoPilot as described above.

If you run the generated test (placed under the /tests folder), either via the Tests tab in VS Code if you installed the Playwright extension, or via the CLI, you should see it pass successfully.

Test runner window in VS Code

Persistent prompt files

The issue with doing this manually in chat is that none of it is durable. Once the chat window is closed, the intent behind a test is lost, updating it becomes harder than it should be, and you no longer have a clean record of what the test was actually trying to prove. You end up with code that “works”, but without the plain-English specification that explains why it exists..

Under our prompts directory, we can create a new markdown file (or a logical folder structure of .md files) for our tests, where the content is the same as the manual scenario. We can now goto the chat window with this test in context and ask it to “Make tests” (or “Update Tests” later)

This will create a new test file under Tests, the same as the manual scenario above.

Because our instruction set includes a rule to update the prompt file with the corresponding generated test file path, these files are automatically kept in sync, creating a 1:1 mapping between the prompt and the test. This means that if we later add another requirement and ask the agent to regenerate, it updates the existing test rather than creating a new one.

At this point we have a framework in place for:

  • Placing markdown files into a structured prompts folder
  • Generating and updating tests without needing to understand the underlying test tooling

⚠️Test your tests!

The auto-healing loop is powerful, but it needs clear boundaries. An agent that is allowed to regenerate tests until they pass can easily optimise for success rather than correctness, for example by weakening assertions or validating incidental behaviour.

Make sure that you debug (as in, watch, or record using Playwright) each test execution each time the test changes to ensure it’s doing what you think it should be doing.

Full Autonomy as an option

Agents and why we deliberately did not use all of them

Playwright MCP ships with multiple agent profiles, typically oriented around planning, execution, and validation. On paper, this looks appealing: a fully autonomous loop that can explore an application, decide what to test, and generate the tests itself. These can optionally be added into the solution easily by running:

npx playwright init-agents --loop=vscode

..and honestly, its impressive. In practice however, this is exactly the boundary we chose not to cross for this proof of concept.

The issue is not capability, it is authority. When the same agent is allowed to discover behaviour, define expectations, and then validate those expectations, you collapse intent and verification into a single feedback loop. That produces coverage, but it does not produce confidence. To contrast with the opening problem, in this case you are not just marking your own homework, you are writing the subject and the test as well. This could be even worse if AI also wrote the code!

But you should check them out anyway…

Once the tooling is installed, providing your running VS Code v1.105 (released October 9, 2025) or later, we will see 3x new agents in the chat window.

Agents list in Co-Pilot showing the 3 Playwright agents

The agents are:

  • Planner
    • This agent explores a URL and generates verbose test plans based on what it discovers, without requiring explicit direction, although guidance can be provided. I did find value in asking it to explore pages I had already written tests for, as it occasionally surfaced additional edge cases. The output is a Markdown test plan that can be reviewed, edited, or passed on to another agent.
  • Generator
    • This agent takes a test plan, optionally produced by the Planner, and generates the required Playwright tests.
  • Healer
    • This agent works against failing tests and focuses on repairing them. It is particularly useful for stabilising brittle UI tests where selectors or timing assumptions have drifted.

I won’t go into detail on this, as their docs have plenty of more verbose notes.

I did actually add these into my project, but only used them for focussed activities rather than in a fully autonomous mode.

What isn’t covered here / Assumed knowledge and scope

Testers still need to think like someone writing automated tests.

This post does not attempt to cover every aspect of building robust automated UI tests. Topics such as environment setup and teardown, test data seeding, authentication flows, and isolation between tests are all still critical to long-term stability and are deliberately out of scope here.

While this approach removes the need for testers to work directly in Playwright or TypeScript (as in, understand the frameworks), it does not remove the need for good testing judgement. Understanding how well-structured automated tests should behave, how to avoid hidden coupling between tests, and how to reason about state, data lifecycle, and failure modes remains essential.

In other words, this lowers the barrier to expressing tests in code, but it does not eliminate the need to think like someone who writes good automated tests.

Why markdown structure matters, aka. My examples are bad…

I chose to use basic numbered list for my test example. Whilst this is valid and may be suitable for basic tests, real prompt files should be structured like lightweight test specs in markdown, for example:

  • A Before Each Test section for navigation and preconditions
  • Optional After Each Test for clean-up
  • A numbered set of scenarios, each with a clear name and deterministic assertions

This keeps prompts durable, reviewable, and easy to evolve over time, and it helps the generated Playwright code stay aligned with the intent rather than becoming a pile of incidental checks.

# Google : Basic Search

# Before Each Test
- Navigate to https://www.google.com
- If a consent dialog is shown, accept it (only if present) so the page is usable.

# 1. Search

## 1.1 Can load the homepage and see the core controls
- Navigate to https://www.google.com
- Search input is visible.
- "Google Search" button is visible.

## 1.2 Can perform a search and reach results
- Navigate to https://www.google.com
- Search input is visible.
- Type "Sausages" into the search input.
- Click "Google Search" (or submit the search form).
- URL indicates we are on a results page (typically contains `/search`).
- Results page contains content related to "Sausages" (at minimum, the query appears on the page).

Final considerations

This is not about replacing testers with AI. While Playwright MCP can be configured to operate in a fully autonomous loop, the more interesting value is in using it as a bridge between human intent and executable tests.

By keeping acceptance criteria explicit and human-owned, and constraining the agent to translate rather than invent intent, you preserve independent validation while removing the need for testers to work directly in code. The result is not fewer testers, but better leverage of their time on behaviour, risk, and edge cases rather than tooling mechanics.

Full autonomy remains a useful option in specific scenarios, such as discovery, baseline coverage, or large UI changes. For validating that a system behaves as intended, however, a constrained, intent-driven approach tends to produce clearer tests and greater confidence.

Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to share on Tumblr (Opens in new window) Tumblr
  • Click to share on Pinterest (Opens in new window) Pinterest

Related

Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to share on Tumblr (Opens in new window) Tumblr
  • Click to share on Pinterest (Opens in new window) Pinterest

Post navigation

Previous Post:

Vibing in Kiro to create a self-serve Portainer wrapper.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Dev.To Profile

Rich's DEV Profile

Tags

#vscode Agentic IDE agile AI Azure AzureDevOps Azure Functions ContinuousImprovement copilot Cosmos DB Cypress DevOps docker dotnet ES6 function-calling GPT Javascript Kiro Mocha NLOG Nuget openai podcast podgrab Portainer PowerShell Productivity python qa QNAP SCRUM SonarQube Testing TFS VisualStudio VSCODE VSTS wordpress
© 2026 yer.ac | Adventures of a developer, and other things. - Powered by Minimalisticky
 

Loading Comments...