
Playwright E2E Tests: I Deployed at 3 AM and Signup Was Broken
Users found the broken signup before I did after a late-night deploy. How I set up Playwright E2E tests and integrated them into CI to prevent this.

Users found the broken signup before I did after a late-night deploy. How I set up Playwright E2E tests and integrated them into CI to prevent this.
How to deploy without shutting down servers. Differences between Rolling, Canary, and Blue-Green. Deep dive into Database Rollback strategies, Online Schema Changes, AWS CodeDeploy integration, and Feature Toggles.

Why is it called 'Canary'? How does it differ from Rolling and Blue/Green deployments? We explore the strategy of releasing to a small subset (1%) of users first to detect issues before they impact everyone. Includes details on Metrics, Tools, and advanced strategies.

Stop fearing Friday deploys. Feature Flags (or Toggles) allow you to push code to production while keeping it dormant. We explore the 4 categories of flags, how they enable Trunk-Based Development, the power of Canary Releases, and the critical importance of cleaning up stale flags to prevent technical debt.

Stop hiring 'DevOps Engineers' to fix your broken culture. DevOps is about breaking down silos between Development and Operations, not creating a new silo. We explore the 'Three Ways' of DevOps (Flow, Feedback, Learning), the importance of CI/CD, Immutable Infrastructure, and why DevSecOps is the inevitable future.

I pushed a patch at 2:40 AM. An authentication logic fix. Worked fine locally. All unit tests passed. I went to sleep confident.
At 7 AM I woke up to a Slack full of messages.
"Signup doesn't work? I enter my email, click Next, and nothing happens."
"Same issue here. Can't create an account."
"Is the service down for maintenance?"
Users found the bug before I did. In the most critical flow possible. A service where signup is broken is essentially a dead service. The entry point for every user journey was blocked for four hours.
It was the early morning hours, so traffic was low. Cold comfort. The embarrassment was worse.
The cause was easy to find. After the email verification step, a sameSite cookie option got corrupted in my patch. The session disappeared when moving to the next step.
Every unit test passed. The function that saved the cookie worked correctly in isolation. The session was being saved. But whether the browser would actually send that cookie on the next request — unit tests can't verify that.
Think of it like a car. Unit tests check individual parts on the workbench. The engine turns over. Oil flows. Pistons move. Everything checks out. But when you assemble the car and drive it down the road, it won't shift out of first gear. The parts were fine. The assembled system doing a real test drive was what you needed.
End-to-End tests are that test drive. A real browser opens, a real user clicks and types, and the entire flow runs from start to finish.
A few E2E tools exist: Cypress, Selenium, Puppeteer. I chose Playwright for one core reason: it waits for you.
Playwright does auto-waiting by default. Tell it to click a button and it waits until that button is in a clickable state. Tell it to check for text and it waits until that text appears in the DOM. No hardcoded sleep(1000) calls.
Teams that write Selenium tests have a common complaint: "The tests keep breaking randomly." The cause is almost always timing. When a page is slow, when a server responds late, when an animation hasn't finished — the test charges ahead and fails. Playwright's auto-waiting eliminates this by default. It removes the most common cause of flaky tests before you even write a line.
The Locator API is also intuitive:
await page.getByRole('button', { name: 'Next' }).click();
await page.getByLabel('Email').fill('user@example.com');
await page.getByText('Signup complete').isVisible();
getByRole, getByLabel, getByText — you find elements the way a user sees the screen. No div.auth-form > .step-2 > button:nth-child(2) selectors. Your tests survive UI restructuring far better.
The flow that caused the incident, written as a test:
// tests/auth/signup.spec.ts
import { test, expect } from '@playwright/test';
test.describe('Signup flow', () => {
test.beforeEach(async ({ page }) => {
await page.goto('/signup');
});
test('successful email signup', async ({ page }) => {
// Step 1: Email entry
await page.getByLabel('Email').fill('newuser@example.com');
await page.getByRole('button', { name: 'Next' }).click();
await expect(page.getByText('Verification code sent')).toBeVisible();
// Step 2: Verification code (fixed value in test env)
await page.getByLabel('Verification code').fill('123456');
await page.getByRole('button', { name: 'Confirm' }).click();
// Step 3: Profile
await page.getByLabel('Nickname').fill('testuser');
await page.getByLabel('Password').fill('SecurePass123!');
await page.getByRole('button', { name: 'Complete signup' }).click();
// The critical check — is the session still alive?
await expect(page).toHaveURL('/dashboard');
await expect(page.getByText('Welcome, testuser')).toBeVisible();
});
test('already registered email', async ({ page }) => {
await page.getByLabel('Email').fill('existing@example.com');
await page.getByRole('button', { name: 'Next' }).click();
await expect(
page.getByText('This email is already registered')
).toBeVisible();
// Page should stay on step 1
await expect(page.getByLabel('Email')).toBeVisible();
});
});
If this test had existed, the pre-deploy run would have failed at await expect(page).toHaveURL('/dashboard'). The broken session would have surfaced immediately.
Three tests is fine. Twenty tests becomes a problem. The same selectors scattered across dozens of files. One UI change breaks them all.
The Page Object Pattern solves this by encapsulating page interactions in a class:
// tests/pages/SignupPage.ts
import { Page, expect } from '@playwright/test';
export class SignupPage {
constructor(private page: Page) {}
async goto() {
await this.page.goto('/signup');
}
async fillEmail(email: string) {
await this.page.getByLabel('Email').fill(email);
await this.page.getByRole('button', { name: 'Next' }).click();
}
async fillVerificationCode(code: string) {
await this.page.getByLabel('Verification code').fill(code);
await this.page.getByRole('button', { name: 'Confirm' }).click();
}
async fillProfile(nickname: string, password: string) {
await this.page.getByLabel('Nickname').fill(nickname);
await this.page.getByLabel('Password').fill(password);
await this.page.getByRole('button', { name: 'Complete signup' }).click();
}
async expectSuccess() {
await expect(this.page).toHaveURL('/dashboard');
}
}
// Using it in tests
test('successful email signup', async ({ page }) => {
const signupPage = new SignupPage(page);
await signupPage.goto();
await signupPage.fillEmail('newuser@example.com');
await signupPage.fillVerificationCode('123456');
await signupPage.fillProfile('testuser', 'SecurePass123!');
await signupPage.expectSuccess();
});
The test reads like a scenario description. The implementation details are hidden inside SignupPage. When the label text changes, you update SignupPage.ts in one place. Twenty tests keep passing.
A page object is like a library reference librarian. You say "I need a book on economics." You don't care how the librarian finds it or where the shelves are arranged. If the library reorganizes (UI changes), only the librarian needs to know the new layout. Your requests (test scenarios) stay the same.
Tests that only run locally don't prevent production incidents. They have to run automatically before every deploy.
# .github/workflows/e2e.yml
name: E2E Tests
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
jobs:
e2e:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Install Playwright browsers
run: npx playwright install --with-deps chromium
- name: Run E2E tests
run: npx playwright test
env:
BASE_URL: ${{ secrets.STAGING_URL }}
TEST_EMAIL: ${{ secrets.TEST_EMAIL }}
TEST_VERIFICATION_CODE: ${{ secrets.TEST_CODE }}
- name: Upload test results
if: failure()
uses: actions/upload-artifact@v4
with:
name: playwright-report
path: playwright-report/
retention-days: 7
The last step is crucial. When tests fail, Playwright automatically captures screenshots and video. Uploading them as artifacts means you don't need to reproduce the failure locally. Download the artifact, open the HTML report, and you can watch exactly what the browser saw at the moment of failure — which step, which state, what was on screen.
The playwright.config.ts for this setup:
export default defineConfig({
testDir: './tests',
fullyParallel: true,
forbidOnly: !!process.env.CI,
retries: process.env.CI ? 2 : 0,
workers: process.env.CI ? 1 : undefined,
reporter: 'html',
use: {
baseURL: process.env.BASE_URL ?? 'http://localhost:3000',
screenshot: 'only-on-failure',
video: 'retain-on-failure',
trace: 'on-first-retry',
},
projects: [
{
name: 'chromium',
use: { ...devices['Desktop Chrome'] },
},
],
});
retries: 2 in CI is worth explaining. CI environments are less stable than local — network latency, resource contention, timing variance. Two retries allow genuinely flaky infrastructure to recover. Fail three times in a row and it's a real problem. This one setting eliminated most of my "randomly fails in CI" complaints.
The most frustrating E2E problem is tests that fail occasionally. Ten runs, nine pass, one fails. The failure is hard to reproduce.
The cause is almost always one of two things.
First, timing. Playwright's auto-waiting covers most cases. But custom components — especially modals and drawers with animations — can still cause issues. The fix is to wait on state, not time:
// Bad: arbitrary wait
await page.waitForTimeout(1000);
// Good: wait for state
await expect(page.getByRole('dialog')).toBeVisible();
await expect(page.getByRole('dialog')).not.toHaveAttribute('aria-hidden', 'true');
Second, shared state between tests. Test A creates a user in the database. Test B tests "when email doesn't exist." A ran first, so the user already exists. Test B fails.
The solution is test isolation. Each test creates its own data and cleans up afterward:
test.beforeEach(async ({ request }) => {
await request.post('/api/test/seed-user', {
data: { email: 'testuser@example.com', code: '123456' },
});
});
test.afterEach(async ({ request }) => {
await request.delete('/api/test/cleanup-user', {
data: { email: 'testuser@example.com' },
});
});
Test-only seed/cleanup API endpoints, disabled in production via NODE_ENV checks, make this manageable. Fully isolated tests can also run in parallel with fullyParallel: true, cutting total test time dramatically. Isolation and parallelization are two sides of the same coin — you need one to safely have the other.
Unit tests and E2E tests catch different things. Unit tests verify individual parts. E2E tests verify the assembled system actually drives. You need both.
Playwright's auto-waiting removes the most common cause of flaky tests. Use state-based waits instead of waitForTimeout.
Page Object Pattern is essential past ten tests. It eliminates duplication and limits the blast radius of UI changes to a single file.
CI integration is what makes E2E tests worth having. Tests that only run locally don't block broken deploys. Failure artifacts — screenshots, video, traces — cut debugging time from hours to minutes.
Test isolation is the prerequisite for parallel execution. Independent test data management enables fast parallel runs without data collision.
Running E2E tests before every deploy means the 3 AM incident doesn't happen again. It's like fire drills. Run them regularly and you stop being afraid of the alarm. The tests are the drill. The deploy is the real thing. I'd much rather be woken up by a failing test than by users in Slack.