Building Your First Spec-Driven AI Project End-to-End
This final article puts everything together: a complete, start-to-finish walkthrough of building a real system using spec-driven development with AI. You'll write a formal spec, decompose it into tasks, prompt AI to generate code and tests, review the output, and deploy a working service. By following this capstone project, you'll have a template for every spec-driven project you build in the future.
Project: User Authentication Service
We'll build a production-ready authentication service with email/password login, JWT tokens, and MFA support.
Phase 1: Requirements and Scope
Objective: Users can create accounts, log in, and access protected resources with optional two-factor authentication.
Users:
- End users (signup, login)
- Admins (user management)
- Third-party services (validate tokens)
Out of scope: OAuth/SSO, passwordless login, social sign-in.
Phase 2: Write the Formal Spec
Start with a complete OpenAPI specification:
openapi: 3.1.0
info:
title: Authentication Service API
version: 1.0.0
description: User authentication with email, password, and MFA
servers:
- url: https://auth.example.com/v1
components:
schemas:
User:
type: object
properties:
id:
type: string
pattern: '^user_[a-z0-9]{16}$'
email:
type: string
format: email
status:
type: string
enum: [active, suspended]
mfaEnabled:
type: boolean
createdAt:
type: string
format: date-time
required: [id, email, status, mfaEnabled, createdAt]
AuthToken:
type: object
properties:
accessToken:
type: string
description: JWT token for API calls
expiresAt:
type: integer
description: Unix timestamp
refreshToken:
type: string
required: [accessToken, expiresAt]
ErrorResponse:
type: object
properties:
error:
type: string
enum: [invalid_credentials, invalid_mfa, user_not_found, user_already_exists]
message:
type: string
required: [error, message]
paths:
/signup:
post:
summary: Create a new user account
requestBody:
required: true
content:
application/json:
schema:
type: object
properties:
email:
type: string
format: email
password:
type: string
minLength: 8
maxLength: 256
description: "Password requirements: min 8 chars, must contain uppercase, lowercase, digit"
required: [email, password]
responses:
'201':
description: Account created successfully
content:
application/json:
schema:
type: object
properties:
user:
$ref: '#/components/schemas/User'
token:
$ref: '#/components/schemas/AuthToken'
'400':
description: Invalid input or password too weak
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
'409':
description: User already exists with this email
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
/login:
post:
summary: Authenticate user and return token
requestBody:
required: true
content:
application/json:
schema:
type: object
properties:
email:
type: string
format: email
password:
type: string
mfaCode:
type: string
description: Required if MFA is enabled
required: [email, password]
responses:
'200':
description: Authentication successful
content:
application/json:
schema:
type: object
properties:
user:
$ref: '#/components/schemas/User'
token:
$ref: '#/components/schemas/AuthToken'
'400':
description: Invalid credentials or missing MFA code
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
'401':
description: Invalid email or password
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
/validate:
post:
summary: Validate a JWT token
security:
- BearerAuth: []
responses:
'200':
description: Token is valid
content:
application/json:
schema:
type: object
properties:
valid: { type: boolean }
userId: { type: string }
expiresAt: { type: integer }
'401':
description: Token is invalid or expired
/mfa/enable:
post:
summary: Enable MFA for user account
security:
- BearerAuth: []
responses:
'200':
description: MFA enabled
content:
application/json:
schema:
type: object
properties:
secret: { type: string, description: "Base32 secret for TOTP" }
qrCode: { type: string, description: "Data URL for QR code" }
'400':
description: MFA already enabled
securitySchemes:
BearerAuth:
type: http
scheme: bearer
bearerFormat: JWT
edgeCases:
- id: "EC-001"
description: "Email exactly at RFC 5322 limit"
endpoint: "/signup"
input: { email: "a" * 241 + "@example.com", password: "Pass1234" }
expected: "201"
- id: "EC-002"
description: "Password exactly 8 characters (minimum)"
endpoint: "/signup"
input: { email: "[email protected]", password: "Pass1234" }
expected: "201"
- id: "EC-003"
description: "Password 7 characters (below minimum)"
endpoint: "/signup"
input: { email: "[email protected]", password: "Pass123" }
expected: "400"
- id: "EC-004"
description: "Password with no uppercase letters"
endpoint: "/signup"
input: { email: "[email protected]", password: "password1" }
expected: "400"
- id: "EC-005"
description: "Concurrent signup with same email"
endpoint: "/signup"
concurrent: true
input: { email: "[email protected]", password: "Pass1234" }
expected: ["201", "409"]
This spec is complete, testable, and machine-readable.
Phase 3: Decompose into Tasks
Break the spec into implementation tasks:
Authentication Service (spec: auth_spec.yaml)
├── Task 1: Database Schema
│ ├── Create users table (id, email, password_hash, mfa_enabled, created_at)
│ ├── Create mfa_secrets table (user_id, secret_base32)
│ ├── Create indexes on email (unique) and user_id
│ └── Success: pytest tests/test_schema.py passes
│
├── Task 2: Password Hashing & Validation
│ ├── Implement password_hash() using bcrypt
│ ├── Implement password_verify()
│ ├── Validate passwords: 8-256 chars, must include uppercase/lowercase/digits
│ └── Success: test_crypto.py:test_password_* all pass
│
├── Task 3: JWT Token Generation
│ ├── Implement create_token(user_id) returning {accessToken, expiresAt}
│ ├── Token expires after 24 hours
│ ├── Sign using HS256 with secret key
│ └── Success: test_tokens.py:test_create_token_* all pass
│
├── Task 4: POST /signup Endpoint
│ ├── Validate email format (RFC 5322)
│ ├── Validate password strength
│ ├── Check email not already exists (unique constraint)
│ ├── Hash password and insert user
│ ├── Return 201 with user + token
│ ├── Return 400 for validation errors
│ ├── Return 409 for duplicate email
│ └── Success: test_api.py:test_signup_* all pass
│
├── Task 5: POST /login Endpoint
│ ├── Find user by email
│ ├── Verify password
│ ├── Check MFA if enabled
│ ├── Return 200 with user + token
│ ├── Return 401 for invalid credentials
│ └── Success: test_api.py:test_login_* all pass
│
├── Task 6: POST /validate Endpoint
│ ├── Extract JWT from Authorization header
│ ├── Verify signature and expiry
│ ├── Return 200 with {valid: true, userId, expiresAt}
│ ├── Return 401 if token invalid/expired
│ └── Success: test_api.py:test_validate_* all pass
│
├── Task 7: MFA (TOTP) Support
│ ├── Generate base32 secret for TOTP
│ ├── Generate QR code
│ ├── Verify TOTP code (time-based, 6 digits)
│ ├── POST /mfa/enable endpoint
│ └── Success: test_mfa.py:test_mfa_* all pass
│
└── Task 8: Integration & E2E Tests
├── Test full signup → login flow
├── Test with MFA enabled
├── Test token expiry
├── Load test (100 concurrent signups)
└── Success: test_e2e.py all pass
Phase 4: Prompt AI to Generate Code
For Task 2 (Password Hashing), create a precise prompt:
# Task: Implement Password Hashing Module
You are an expert Python backend engineer.
Implement password hashing and validation using bcrypt.
## Requirements:
- Use bcrypt for password hashing (library: bcrypt)
- hash_password(password: str) -> str
- Accepts password 8-256 chars
- Returns bcrypt hash
- Must be deterministic (same password = same hash)
- password_validate(password: str, hash: str) -> bool
- Returns True if password matches hash
- Returns False otherwise (invalid password or format)
- validate_password_strength(password: str) -> tuple[bool, str]
- Returns (True, "valid") if password meets requirements
- Returns (False, "Password must contain uppercase, lowercase, and digits") otherwise
- Requirements: min 8 chars, max 256 chars, include uppercase, lowercase, digits
## Password Requirements:
- Minimum length: 8 characters
- Maximum length: 256 characters
- Must contain: uppercase letter (A-Z), lowercase letter (a-z), digit (0-9)
- Can contain: any ASCII printable character
## Test Vectors (from spec):
Valid passwords:
- "Pass1234" (minimum length, all requirements met)
- "MyP@ssw0rd!" (includes special character)
Invalid passwords:
- "Pass123" (only 7 chars, below minimum)
- "password1" (no uppercase letters)
- "PASSWORD1" (no lowercase letters)
- "Passaaaa" (no digits)
- "pass word" (space not allowed per requirements) — Actually, spaces ARE allowed; remove this
## Code Organization:
- File: auth/crypto.py
- No external dependencies beyond bcrypt
## Error Handling:
- Raise ValueError if password is too short/long with message
- Return False from password_validate if any exception (never raise)
Output: Complete, production-ready Python module with docstrings and type hints.
The AI generates:
# auth/crypto.py
import bcrypt
import re
from typing import Tuple
def hash_password(password: str) -> str:
"""Hash a password using bcrypt.
Args:
password: Password string (8-256 chars)
Returns:
Bcrypt hash string
Raises:
ValueError: If password is invalid
"""
if not isinstance(password, str):
raise ValueError("Password must be a string")
if len(password) < 8:
raise ValueError("Password must be at least 8 characters")
if len(password) > 256:
raise ValueError("Password must not exceed 256 characters")
# Generate bcrypt hash with cost 12 (security/performance balance)
salt = bcrypt.gensalt(rounds=12)
hashed = bcrypt.hashpw(password.encode('utf-8'), salt)
return hashed.decode('utf-8')
def password_verify(password: str, hash_: str) -> bool:
"""Verify a password against a bcrypt hash.
Args:
password: Password to verify
hash_: Bcrypt hash to compare against
Returns:
True if password matches hash, False otherwise
"""
try:
return bcrypt.checkpw(password.encode('utf-8'), hash_.encode('utf-8'))
except Exception:
# Any error (invalid hash format, etc.) = not a match
return False
def validate_password_strength(password: str) -> Tuple[bool, str]:
"""Validate password meets strength requirements.
Args:
password: Password to validate
Returns:
Tuple of (is_valid: bool, message: str)
"""
if not isinstance(password, str):
return False, "Password must be a string"
# Length checks
if len(password) < 8:
return False, "Password must be at least 8 characters"
if len(password) > 256:
return False, "Password must not exceed 256 characters"
# Character requirements
has_uppercase = bool(re.search(r'[A-Z]', password))
has_lowercase = bool(re.search(r'[a-z]', password))
has_digit = bool(re.search(r'[0-9]', password))
if not has_uppercase:
return False, "Password must contain at least one uppercase letter"
if not has_lowercase:
return False, "Password must contain at least one lowercase letter"
if not has_digit:
return False, "Password must contain at least one digit"
return True, "Password meets all requirements"
This code directly implements the spec's constraints.
Phase 5: Generate Tests
Prompt AI to generate pytest tests from the edge cases:
# Task: Generate pytest tests for password module
Generate comprehensive pytest tests for auth/crypto.py
## Test Cases (from spec edgeCases):
### Boundary Values:
- EC-002: Password exactly 8 characters (minimum) → should accept
- EC-003: Password 7 characters (below minimum) → should reject
- EC-004: Password with no uppercase letters → should reject
### Additional Cases:
- Valid password with all requirements
- Valid password with special characters
- Invalid passwords (too short, too long, missing requirements)
- Hashing produces different result each time (random salt)
- Verification succeeds for correct password
- Verification fails for incorrect password
- Verification handles invalid hash format gracefully
Output complete pytest code in file tests/test_crypto.py
The AI generates:
# tests/test_crypto.py
import pytest
from auth.crypto import hash_password, password_verify, validate_password_strength
class TestHashPassword:
"""Test password hashing with bcrypt"""
def test_hash_valid_password(self):
"""Hash a valid password"""
hashed = hash_password("Pass1234")
assert hashed is not None
assert len(hashed) > 0
assert hashed.startswith("$2b$") # bcrypt format
def test_hash_returns_different_value_each_time(self):
"""Same password hashes to different values (random salt)"""
hash1 = hash_password("Pass1234")
hash2 = hash_password("Pass1234")
assert hash1 != hash2 # Different due to salt
def test_hash_password_too_short(self):
"""Password below minimum length raises error"""
with pytest.raises(ValueError, match="at least 8 characters"):
hash_password("Pass123")
def test_hash_password_too_long(self):
"""Password exceeds maximum length raises error"""
long_password = "Pass1234" * 50 # > 256 chars
with pytest.raises(ValueError, match="not exceed 256 characters"):
hash_password(long_password)
class TestPasswordVerify:
"""Test password verification"""
def test_verify_correct_password(self):
"""Verification succeeds for correct password"""
password = "Pass1234"
hashed = hash_password(password)
assert password_verify(password, hashed) is True
def test_verify_incorrect_password(self):
"""Verification fails for incorrect password"""
password = "Pass1234"
hashed = hash_password(password)
assert password_verify("Pass5678", hashed) is False
def test_verify_invalid_hash_format(self):
"""Verification fails gracefully for invalid hash"""
assert password_verify("Pass1234", "invalid_hash") is False
assert password_verify("Pass1234", "") is False
def test_verify_type_error_handled(self):
"""Verification handles type errors gracefully"""
hashed = hash_password("Pass1234")
assert password_verify(None, hashed) is False # None is not a string
class TestPasswordStrength:
"""Test password strength validation"""
# EC-002: Minimum length (8 chars)
def test_password_minimum_length_accepted(self):
"""Password exactly 8 characters is accepted"""
valid, msg = validate_password_strength("Pass1234")
assert valid is True
# EC-003: Below minimum length (7 chars)
def test_password_below_minimum_rejected(self):
"""Password 7 characters is rejected"""
valid, msg = validate_password_strength("Pass123")
assert valid is False
assert "at least 8 characters" in msg
# EC-004: No uppercase
def test_password_no_uppercase_rejected(self):
"""Password without uppercase letter is rejected"""
valid, msg = validate_password_strength("password1")
assert valid is False
assert "uppercase" in msg
def test_password_no_lowercase_rejected(self):
"""Password without lowercase letter is rejected"""
valid, msg = validate_password_strength("PASSWORD1")
assert valid is False
assert "lowercase" in msg
def test_password_no_digit_rejected(self):
"""Password without digit is rejected"""
valid, msg = validate_password_strength("Password")
assert valid is False
assert "digit" in msg
def test_password_valid_with_all_requirements(self):
"""Valid password with uppercase, lowercase, digit"""
valid, msg = validate_password_strength("MyPass123")
assert valid is True
def test_password_valid_with_special_chars(self):
"""Valid password including special characters"""
valid, msg = validate_password_strength("MyP@ss123!")
assert valid is True
def test_password_maximum_length_accepted(self):
"""Password at 256 character limit is accepted"""
password = "Pass" + "1" * 252 # 256 chars total
valid, msg = validate_password_strength(password)
assert valid is True
def test_password_exceeds_maximum_rejected(self):
"""Password exceeding 256 characters is rejected"""
password = "Pass" + "1" * 253 # 257 chars
valid, msg = validate_password_strength(password)
assert valid is False
assert "not exceed 256" in msg
Tests are generated directly from the spec's edge cases.
Phase 6: Review and Deploy
Run automated checks:
# Syntax validation
python -m py_compile auth/crypto.py
python -m py_compile tests/test_crypto.py
# Type checking
mypy auth/crypto.py
# Linting
flake8 auth/crypto.py tests/test_crypto.py
black --check auth/crypto.py tests/test_crypto.py
# Unit tests
pytest tests/test_crypto.py -v
# Coverage
pytest tests/test_crypto.py --cov=auth.crypto --cov-report=html
# Schema validation (if applicable)
python validate_spec_schemas.py auth_spec.yaml
If all pass, the code is ready for review:
REVIEW CHECKLIST ✓
==================
✓ Code passes syntax validation
✓ Type hints are correct
✓ Linting passes (flake8, black)
✓ All tests pass (100% coverage for crypto.py)
✓ Spec compliance verified (schema validation)
✓ No security issues (no hardcoded secrets, proper error handling)
✓ Performance acceptable (bcrypt cost=12 takes ~100ms per hash)
READY FOR: Code review, then merge to main
Phase 7: Repeat for Other Tasks
Repeat Phases 4–6 for Tasks 3–8 (JWT tokens, endpoints, MFA, integration tests).
Phase 8: End-to-End Testing
After all tasks are complete, run integration tests:
# Start service in Docker
docker-compose up -d
# Run E2E tests
pytest tests/test_e2e.py -v
# Load test (100 concurrent users)
locust -f tests/load_test.py --users 100 --spawn-rate 10
# Manual smoke test
curl -X POST http://localhost:5000/v1/signup \
-H "Content-Type: application/json" \
-d '{"email": "[email protected]", "password": "Pass1234"}'
Expected output:
{
"user": {
"id": "user_abc123def456",
"email": "[email protected]",
"status": "active",
"mfaEnabled": false,
"createdAt": "2026-06-02T10:30:00Z"
},
"token": {
"accessToken": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"expiresAt": 1748985000
}
}
Phase 9: Deploy to Production
Once all tests pass, deploy:
# Tag release
git tag v1.0.0
git push origin v1.0.0
# Deploy to production
docker build -t auth-service:1.0.0 .
docker push auth-service:1.0.0
kubectl apply -f deployment.yaml
# Monitor
kubectl logs -f deployment/auth-service
datadog dashboard show auth-service-health
Key Lessons from Capstone
- Specs first: Writing the spec upfront prevents rework and clarifies intent.
- Decompose early: Breaking into tasks enables parallelism and focus.
- Prompts matter: Precise prompts generate better code. Vague prompts generate guesses.
- Tests as specs: Tests derived from edge cases verify the spec is honored.
- Automation helps: CI checks catch errors early; no waiting for human review.
- Iterate: First code generation rarely perfect. Review → fix spec/prompt → regenerate.
Key Takeaways
- A complete spec-driven project flows: requirements → formal spec → task decomposition → AI code generation → tests → review → deployment.
- Formal specs (OpenAPI, JSON Schema) eliminate ambiguity and enable automated validation.
- Test vectors in specs (edge cases) ensure code generation is complete and handles boundaries.
- Automated checks (syntax, type, schema validation) catch 80% of errors before human review.
- Integration testing verifies that all pieces work together end-to-end.
Frequently Asked Questions
How long did this capstone project take to complete?
Following this workflow, a production-ready authentication service (Task 1–8) takes 2–3 weeks for an experienced team. Most time is in review and refinement, not initial generation.
What if AI-generated code doesn't pass tests?
Update the prompt or spec and regenerate. Often the issue is ambiguity in the spec. Clarify it, then regenerate. This is normal.
Can I skip any phase?
Not recommended. Each phase catches different issues:
- Spec writing: clarifies requirements
- Decomposition: enables parallelism and focus
- Testing: verifies correctness
- Review: catches edge cases and quality issues
- Deployment: verifies production-readiness
Skipping a phase usually leads to rework.
How do I know when a project is "spec-driven complete"?
When: (1) all spec requirements are implemented, (2) all edge cases are tested, (3) code is reviewed and approved, (4) integration tests pass, (5) spec is versioned and archived, and (6) code/tests are version controlled.