Skip to main content

Building Your First Spec-Driven AI Project End-to-End

This final article puts everything together: a complete, start-to-finish walkthrough of building a real system using spec-driven development with AI. You'll write a formal spec, decompose it into tasks, prompt AI to generate code and tests, review the output, and deploy a working service. By following this capstone project, you'll have a template for every spec-driven project you build in the future.

Project: User Authentication Service

We'll build a production-ready authentication service with email/password login, JWT tokens, and MFA support.

Phase 1: Requirements and Scope

Objective: Users can create accounts, log in, and access protected resources with optional two-factor authentication.

Users:

  • End users (signup, login)
  • Admins (user management)
  • Third-party services (validate tokens)

Out of scope: OAuth/SSO, passwordless login, social sign-in.

Phase 2: Write the Formal Spec

Start with a complete OpenAPI specification:

openapi: 3.1.0
info:
title: Authentication Service API
version: 1.0.0
description: User authentication with email, password, and MFA

servers:
- url: https://auth.example.com/v1

components:
schemas:
User:
type: object
properties:
id:
type: string
pattern: '^user_[a-z0-9]{16}$'
email:
type: string
format: email
status:
type: string
enum: [active, suspended]
mfaEnabled:
type: boolean
createdAt:
type: string
format: date-time
required: [id, email, status, mfaEnabled, createdAt]

AuthToken:
type: object
properties:
accessToken:
type: string
description: JWT token for API calls
expiresAt:
type: integer
description: Unix timestamp
refreshToken:
type: string
required: [accessToken, expiresAt]

ErrorResponse:
type: object
properties:
error:
type: string
enum: [invalid_credentials, invalid_mfa, user_not_found, user_already_exists]
message:
type: string
required: [error, message]

paths:
/signup:
post:
summary: Create a new user account
requestBody:
required: true
content:
application/json:
schema:
type: object
properties:
email:
type: string
format: email
password:
type: string
minLength: 8
maxLength: 256
description: "Password requirements: min 8 chars, must contain uppercase, lowercase, digit"
required: [email, password]
responses:
'201':
description: Account created successfully
content:
application/json:
schema:
type: object
properties:
user:
$ref: '#/components/schemas/User'
token:
$ref: '#/components/schemas/AuthToken'
'400':
description: Invalid input or password too weak
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
'409':
description: User already exists with this email
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'

/login:
post:
summary: Authenticate user and return token
requestBody:
required: true
content:
application/json:
schema:
type: object
properties:
email:
type: string
format: email
password:
type: string
mfaCode:
type: string
description: Required if MFA is enabled
required: [email, password]
responses:
'200':
description: Authentication successful
content:
application/json:
schema:
type: object
properties:
user:
$ref: '#/components/schemas/User'
token:
$ref: '#/components/schemas/AuthToken'
'400':
description: Invalid credentials or missing MFA code
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
'401':
description: Invalid email or password
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'

/validate:
post:
summary: Validate a JWT token
security:
- BearerAuth: []
responses:
'200':
description: Token is valid
content:
application/json:
schema:
type: object
properties:
valid: { type: boolean }
userId: { type: string }
expiresAt: { type: integer }
'401':
description: Token is invalid or expired

/mfa/enable:
post:
summary: Enable MFA for user account
security:
- BearerAuth: []
responses:
'200':
description: MFA enabled
content:
application/json:
schema:
type: object
properties:
secret: { type: string, description: "Base32 secret for TOTP" }
qrCode: { type: string, description: "Data URL for QR code" }
'400':
description: MFA already enabled

securitySchemes:
BearerAuth:
type: http
scheme: bearer
bearerFormat: JWT

edgeCases:
- id: "EC-001"
description: "Email exactly at RFC 5322 limit"
endpoint: "/signup"
input: { email: "a" * 241 + "@example.com", password: "Pass1234" }
expected: "201"

- id: "EC-002"
description: "Password exactly 8 characters (minimum)"
endpoint: "/signup"
input: { email: "[email protected]", password: "Pass1234" }
expected: "201"

- id: "EC-003"
description: "Password 7 characters (below minimum)"
endpoint: "/signup"
input: { email: "[email protected]", password: "Pass123" }
expected: "400"

- id: "EC-004"
description: "Password with no uppercase letters"
endpoint: "/signup"
input: { email: "[email protected]", password: "password1" }
expected: "400"

- id: "EC-005"
description: "Concurrent signup with same email"
endpoint: "/signup"
concurrent: true
input: { email: "[email protected]", password: "Pass1234" }
expected: ["201", "409"]

This spec is complete, testable, and machine-readable.

Phase 3: Decompose into Tasks

Break the spec into implementation tasks:

Authentication Service (spec: auth_spec.yaml)
├── Task 1: Database Schema
│ ├── Create users table (id, email, password_hash, mfa_enabled, created_at)
│ ├── Create mfa_secrets table (user_id, secret_base32)
│ ├── Create indexes on email (unique) and user_id
│ └── Success: pytest tests/test_schema.py passes

├── Task 2: Password Hashing & Validation
│ ├── Implement password_hash() using bcrypt
│ ├── Implement password_verify()
│ ├── Validate passwords: 8-256 chars, must include uppercase/lowercase/digits
│ └── Success: test_crypto.py:test_password_* all pass

├── Task 3: JWT Token Generation
│ ├── Implement create_token(user_id) returning {accessToken, expiresAt}
│ ├── Token expires after 24 hours
│ ├── Sign using HS256 with secret key
│ └── Success: test_tokens.py:test_create_token_* all pass

├── Task 4: POST /signup Endpoint
│ ├── Validate email format (RFC 5322)
│ ├── Validate password strength
│ ├── Check email not already exists (unique constraint)
│ ├── Hash password and insert user
│ ├── Return 201 with user + token
│ ├── Return 400 for validation errors
│ ├── Return 409 for duplicate email
│ └── Success: test_api.py:test_signup_* all pass

├── Task 5: POST /login Endpoint
│ ├── Find user by email
│ ├── Verify password
│ ├── Check MFA if enabled
│ ├── Return 200 with user + token
│ ├── Return 401 for invalid credentials
│ └── Success: test_api.py:test_login_* all pass

├── Task 6: POST /validate Endpoint
│ ├── Extract JWT from Authorization header
│ ├── Verify signature and expiry
│ ├── Return 200 with {valid: true, userId, expiresAt}
│ ├── Return 401 if token invalid/expired
│ └── Success: test_api.py:test_validate_* all pass

├── Task 7: MFA (TOTP) Support
│ ├── Generate base32 secret for TOTP
│ ├── Generate QR code
│ ├── Verify TOTP code (time-based, 6 digits)
│ ├── POST /mfa/enable endpoint
│ └── Success: test_mfa.py:test_mfa_* all pass

└── Task 8: Integration & E2E Tests
├── Test full signup → login flow
├── Test with MFA enabled
├── Test token expiry
├── Load test (100 concurrent signups)
└── Success: test_e2e.py all pass

Phase 4: Prompt AI to Generate Code

For Task 2 (Password Hashing), create a precise prompt:

# Task: Implement Password Hashing Module

You are an expert Python backend engineer.
Implement password hashing and validation using bcrypt.

## Requirements:
- Use bcrypt for password hashing (library: bcrypt)
- hash_password(password: str) -> str
- Accepts password 8-256 chars
- Returns bcrypt hash
- Must be deterministic (same password = same hash)
- password_validate(password: str, hash: str) -> bool
- Returns True if password matches hash
- Returns False otherwise (invalid password or format)
- validate_password_strength(password: str) -> tuple[bool, str]
- Returns (True, "valid") if password meets requirements
- Returns (False, "Password must contain uppercase, lowercase, and digits") otherwise
- Requirements: min 8 chars, max 256 chars, include uppercase, lowercase, digits

## Password Requirements:
- Minimum length: 8 characters
- Maximum length: 256 characters
- Must contain: uppercase letter (A-Z), lowercase letter (a-z), digit (0-9)
- Can contain: any ASCII printable character

## Test Vectors (from spec):
Valid passwords:
- "Pass1234" (minimum length, all requirements met)
- "MyP@ssw0rd!" (includes special character)

Invalid passwords:
- "Pass123" (only 7 chars, below minimum)
- "password1" (no uppercase letters)
- "PASSWORD1" (no lowercase letters)
- "Passaaaa" (no digits)
- "pass word" (space not allowed per requirements) — Actually, spaces ARE allowed; remove this

## Code Organization:
- File: auth/crypto.py
- No external dependencies beyond bcrypt

## Error Handling:
- Raise ValueError if password is too short/long with message
- Return False from password_validate if any exception (never raise)

Output: Complete, production-ready Python module with docstrings and type hints.

The AI generates:

# auth/crypto.py

import bcrypt
import re
from typing import Tuple

def hash_password(password: str) -> str:
"""Hash a password using bcrypt.

Args:
password: Password string (8-256 chars)

Returns:
Bcrypt hash string

Raises:
ValueError: If password is invalid
"""
if not isinstance(password, str):
raise ValueError("Password must be a string")
if len(password) < 8:
raise ValueError("Password must be at least 8 characters")
if len(password) > 256:
raise ValueError("Password must not exceed 256 characters")

# Generate bcrypt hash with cost 12 (security/performance balance)
salt = bcrypt.gensalt(rounds=12)
hashed = bcrypt.hashpw(password.encode('utf-8'), salt)
return hashed.decode('utf-8')


def password_verify(password: str, hash_: str) -> bool:
"""Verify a password against a bcrypt hash.

Args:
password: Password to verify
hash_: Bcrypt hash to compare against

Returns:
True if password matches hash, False otherwise
"""
try:
return bcrypt.checkpw(password.encode('utf-8'), hash_.encode('utf-8'))
except Exception:
# Any error (invalid hash format, etc.) = not a match
return False


def validate_password_strength(password: str) -> Tuple[bool, str]:
"""Validate password meets strength requirements.

Args:
password: Password to validate

Returns:
Tuple of (is_valid: bool, message: str)
"""
if not isinstance(password, str):
return False, "Password must be a string"

# Length checks
if len(password) < 8:
return False, "Password must be at least 8 characters"
if len(password) > 256:
return False, "Password must not exceed 256 characters"

# Character requirements
has_uppercase = bool(re.search(r'[A-Z]', password))
has_lowercase = bool(re.search(r'[a-z]', password))
has_digit = bool(re.search(r'[0-9]', password))

if not has_uppercase:
return False, "Password must contain at least one uppercase letter"
if not has_lowercase:
return False, "Password must contain at least one lowercase letter"
if not has_digit:
return False, "Password must contain at least one digit"

return True, "Password meets all requirements"

This code directly implements the spec's constraints.

Phase 5: Generate Tests

Prompt AI to generate pytest tests from the edge cases:

# Task: Generate pytest tests for password module

Generate comprehensive pytest tests for auth/crypto.py

## Test Cases (from spec edgeCases):

### Boundary Values:
- EC-002: Password exactly 8 characters (minimum) → should accept
- EC-003: Password 7 characters (below minimum) → should reject
- EC-004: Password with no uppercase letters → should reject

### Additional Cases:
- Valid password with all requirements
- Valid password with special characters
- Invalid passwords (too short, too long, missing requirements)
- Hashing produces different result each time (random salt)
- Verification succeeds for correct password
- Verification fails for incorrect password
- Verification handles invalid hash format gracefully

Output complete pytest code in file tests/test_crypto.py

The AI generates:

# tests/test_crypto.py

import pytest
from auth.crypto import hash_password, password_verify, validate_password_strength


class TestHashPassword:
"""Test password hashing with bcrypt"""

def test_hash_valid_password(self):
"""Hash a valid password"""
hashed = hash_password("Pass1234")
assert hashed is not None
assert len(hashed) > 0
assert hashed.startswith("$2b$") # bcrypt format

def test_hash_returns_different_value_each_time(self):
"""Same password hashes to different values (random salt)"""
hash1 = hash_password("Pass1234")
hash2 = hash_password("Pass1234")
assert hash1 != hash2 # Different due to salt

def test_hash_password_too_short(self):
"""Password below minimum length raises error"""
with pytest.raises(ValueError, match="at least 8 characters"):
hash_password("Pass123")

def test_hash_password_too_long(self):
"""Password exceeds maximum length raises error"""
long_password = "Pass1234" * 50 # > 256 chars
with pytest.raises(ValueError, match="not exceed 256 characters"):
hash_password(long_password)


class TestPasswordVerify:
"""Test password verification"""

def test_verify_correct_password(self):
"""Verification succeeds for correct password"""
password = "Pass1234"
hashed = hash_password(password)
assert password_verify(password, hashed) is True

def test_verify_incorrect_password(self):
"""Verification fails for incorrect password"""
password = "Pass1234"
hashed = hash_password(password)
assert password_verify("Pass5678", hashed) is False

def test_verify_invalid_hash_format(self):
"""Verification fails gracefully for invalid hash"""
assert password_verify("Pass1234", "invalid_hash") is False
assert password_verify("Pass1234", "") is False

def test_verify_type_error_handled(self):
"""Verification handles type errors gracefully"""
hashed = hash_password("Pass1234")
assert password_verify(None, hashed) is False # None is not a string


class TestPasswordStrength:
"""Test password strength validation"""

# EC-002: Minimum length (8 chars)
def test_password_minimum_length_accepted(self):
"""Password exactly 8 characters is accepted"""
valid, msg = validate_password_strength("Pass1234")
assert valid is True

# EC-003: Below minimum length (7 chars)
def test_password_below_minimum_rejected(self):
"""Password 7 characters is rejected"""
valid, msg = validate_password_strength("Pass123")
assert valid is False
assert "at least 8 characters" in msg

# EC-004: No uppercase
def test_password_no_uppercase_rejected(self):
"""Password without uppercase letter is rejected"""
valid, msg = validate_password_strength("password1")
assert valid is False
assert "uppercase" in msg

def test_password_no_lowercase_rejected(self):
"""Password without lowercase letter is rejected"""
valid, msg = validate_password_strength("PASSWORD1")
assert valid is False
assert "lowercase" in msg

def test_password_no_digit_rejected(self):
"""Password without digit is rejected"""
valid, msg = validate_password_strength("Password")
assert valid is False
assert "digit" in msg

def test_password_valid_with_all_requirements(self):
"""Valid password with uppercase, lowercase, digit"""
valid, msg = validate_password_strength("MyPass123")
assert valid is True

def test_password_valid_with_special_chars(self):
"""Valid password including special characters"""
valid, msg = validate_password_strength("MyP@ss123!")
assert valid is True

def test_password_maximum_length_accepted(self):
"""Password at 256 character limit is accepted"""
password = "Pass" + "1" * 252 # 256 chars total
valid, msg = validate_password_strength(password)
assert valid is True

def test_password_exceeds_maximum_rejected(self):
"""Password exceeding 256 characters is rejected"""
password = "Pass" + "1" * 253 # 257 chars
valid, msg = validate_password_strength(password)
assert valid is False
assert "not exceed 256" in msg

Tests are generated directly from the spec's edge cases.

Phase 6: Review and Deploy

Run automated checks:

# Syntax validation
python -m py_compile auth/crypto.py
python -m py_compile tests/test_crypto.py

# Type checking
mypy auth/crypto.py

# Linting
flake8 auth/crypto.py tests/test_crypto.py
black --check auth/crypto.py tests/test_crypto.py

# Unit tests
pytest tests/test_crypto.py -v

# Coverage
pytest tests/test_crypto.py --cov=auth.crypto --cov-report=html

# Schema validation (if applicable)
python validate_spec_schemas.py auth_spec.yaml

If all pass, the code is ready for review:

REVIEW CHECKLIST ✓
==================

✓ Code passes syntax validation
✓ Type hints are correct
✓ Linting passes (flake8, black)
✓ All tests pass (100% coverage for crypto.py)
✓ Spec compliance verified (schema validation)
✓ No security issues (no hardcoded secrets, proper error handling)
✓ Performance acceptable (bcrypt cost=12 takes ~100ms per hash)

READY FOR: Code review, then merge to main

Phase 7: Repeat for Other Tasks

Repeat Phases 4–6 for Tasks 3–8 (JWT tokens, endpoints, MFA, integration tests).

Phase 8: End-to-End Testing

After all tasks are complete, run integration tests:

# Start service in Docker
docker-compose up -d

# Run E2E tests
pytest tests/test_e2e.py -v

# Load test (100 concurrent users)
locust -f tests/load_test.py --users 100 --spawn-rate 10

# Manual smoke test
curl -X POST http://localhost:5000/v1/signup \
-H "Content-Type: application/json" \
-d '{"email": "[email protected]", "password": "Pass1234"}'

Expected output:

{
"user": {
"id": "user_abc123def456",
"email": "[email protected]",
"status": "active",
"mfaEnabled": false,
"createdAt": "2026-06-02T10:30:00Z"
},
"token": {
"accessToken": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"expiresAt": 1748985000
}
}

Phase 9: Deploy to Production

Once all tests pass, deploy:

# Tag release
git tag v1.0.0
git push origin v1.0.0

# Deploy to production
docker build -t auth-service:1.0.0 .
docker push auth-service:1.0.0
kubectl apply -f deployment.yaml

# Monitor
kubectl logs -f deployment/auth-service
datadog dashboard show auth-service-health

Key Lessons from Capstone

  1. Specs first: Writing the spec upfront prevents rework and clarifies intent.
  2. Decompose early: Breaking into tasks enables parallelism and focus.
  3. Prompts matter: Precise prompts generate better code. Vague prompts generate guesses.
  4. Tests as specs: Tests derived from edge cases verify the spec is honored.
  5. Automation helps: CI checks catch errors early; no waiting for human review.
  6. Iterate: First code generation rarely perfect. Review → fix spec/prompt → regenerate.

Key Takeaways

  • A complete spec-driven project flows: requirements → formal spec → task decomposition → AI code generation → tests → review → deployment.
  • Formal specs (OpenAPI, JSON Schema) eliminate ambiguity and enable automated validation.
  • Test vectors in specs (edge cases) ensure code generation is complete and handles boundaries.
  • Automated checks (syntax, type, schema validation) catch 80% of errors before human review.
  • Integration testing verifies that all pieces work together end-to-end.

Frequently Asked Questions

How long did this capstone project take to complete?

Following this workflow, a production-ready authentication service (Task 1–8) takes 2–3 weeks for an experienced team. Most time is in review and refinement, not initial generation.

What if AI-generated code doesn't pass tests?

Update the prompt or spec and regenerate. Often the issue is ambiguity in the spec. Clarify it, then regenerate. This is normal.

Can I skip any phase?

Not recommended. Each phase catches different issues:

  • Spec writing: clarifies requirements
  • Decomposition: enables parallelism and focus
  • Testing: verifies correctness
  • Review: catches edge cases and quality issues
  • Deployment: verifies production-readiness

Skipping a phase usually leads to rework.

How do I know when a project is "spec-driven complete"?

When: (1) all spec requirements are implemented, (2) all edge cases are tested, (3) code is reviewed and approved, (4) integration tests pass, (5) spec is versioned and archived, and (6) code/tests are version controlled.

Further Reading