Fixing Code Generation Errors: Prompt Iteration
No AI code generation is perfect on the first try. The key to effective iteration is providing clear feedback about what failed and why. Rather than saying "the code is broken," you describe the specific error, show the test case that failed, and clarify your intent. This article teaches the iteration loop: test, diagnose, refine, regenerate.
The Iteration Loop
1. Generate code from initial prompt
↓
2. Test or review the code
↓
3. Identify what failed (error message, test output, or logical flaw)
↓
4. Refine the prompt with concrete feedback
↓
5. Regenerate and repeat until correct
Most code generation requires 1–3 iterations. If you need more than 5 iterations, your specification was likely too vague; consider rewriting the prompt from scratch with a clearer approach.
Providing Effective Error Feedback
When you test generated code and find a bug, your follow-up prompt should include:
- The test case or scenario that failed
- The actual output or error message
- The expected output (what should have happened)
- Your best guess about what went wrong (optional but helpful)
Example 1: Logic Error
Initial prompt generated a function to calculate discounts, but it failed on a test case:
Generated code:
def apply_discount(price: float, percent: float) -> float:
return price * (percent / 100)
Test case that failed:
- apply_discount(100.0, 20) should return 80.0 (original price minus 20% discount)
- Actual output: 20.0 (only the discount amount, not the discounted price)
Expected: 80.0
Actual: 20.0
Problem: Your code returns only the discount amount, not the final price.
The formula should be: price * (1 - percent / 100)
Example 2: Type Error
Generated code had incorrect return type handling:
Generated code:
def fetch_user(user_id: int) -> dict:
# ... code ...
return None # if user not found
Error when testing:
- Test: user_data = fetch_user(999)
Expected: None or empty dict
Actual: TypeError: 'NoneType' object is not subscriptable
Problem: The function signature says it returns dict, but the code returns None.
Fix: Change return type to dict | None, OR return {} instead of None.
Please update the return type AND adjust the body accordingly.
Example 3: Missing Error Handling
Generated code didn't handle an edge case:
Generated code:
def parse_csv(filepath: str) -> list[dict]:
with open(filepath) as f:
reader = csv.DictReader(f)
return list(reader)
Test case that failed:
- Test with file that has no header row (just data)
- Expected: Raise ValueError("CSV must have a header row")
- Actual: Returns list with one dict, with numeric keys ('0', '1', ...)
Problem: When header row is missing, csv.DictReader uses first row as headers.
Fix: Add validation after reading the first line to ensure headers are valid.
Include example of what a valid header looks like.
Refining Your Prompt for Iteration
When regenerating, update your prompt with:
- The failing test case (add to your examples section)
- The specific requirement that was missed (add to requirements)
- Any clarification about intent (reword ambiguous parts)
- Do NOT just say "fix it"—give concrete direction
Good iteration prompt:
The generated code failed on this test case:
Input: apply_discount(100.0, 20)
Expected: 80.0 (100 minus 20%)
Actual: 20.0
The formula is wrong. It should be: original_price * (1 - discount_percent / 100)
Please regenerate the function with the corrected formula and include this test case in the docstring examples.
Poor iteration prompt:
This code doesn't work. Fix it.
The good prompt explains exactly what went wrong and how to fix it.
Common Regeneration Patterns
Pattern 1: "Add a missing requirement"
The generated function doesn't handle [specific case].
Add a requirement: If [condition], [desired behavior].
Example:
Input: [test case]
Expected: [expected output]
Regenerate with this requirement added.
Pattern 2: "Fix an edge case"
The function fails when [specific scenario].
Test case:
Input: [failing input]
Actual: [what it currently does]
Expected: [what it should do]
Update the code to handle this case. Include this as an example in the docstring.
Pattern 3: "Change the approach"
The current approach of [what it's doing] doesn't work well because [reason].
Instead, try [alternative approach].
Here's an example of what the alternative approach should do:
Input: [example]
Process: [how it should work]
Output: [expected result]
Regenerate using the [alternative] approach.
Pattern 4: "Clarify the specification"
I think there's ambiguity in the original spec. Let me clarify:
When [scenario], the function should [desired behavior], not [what it currently does].
Reason: [explanation of why]
Example:
Input: [test case]
Expected: [correct output]
Regenerate with this clarification.
Testing-Driven Iteration
The most effective iteration includes test cases. After generation, write simple tests and show the AI model which tests fail:
Generated function: format_phone_number(phone: str) -> str
Test results:
✗ format_phone_number("1234567890") == "(123) 456-7890" # FAILED
Actual: "123-456-7890"
✓ format_phone_number("(123) 456-7890") == "(123) 456-7890" # passed
✗ format_phone_number("abc") raises ValueError # FAILED
Actual: returns "abc" without raising
Fix the failing tests. The function should:
1. Parse various input formats
2. Always output (123) 456-7890 format
3. Raise ValueError for non-numeric inputs
By showing test output, you give the AI model exact feedback on correctness.
Debugging Generated Code Locally
Before iterating, test locally to understand the failure:
# Generated code
def calculate_discount(price: float, tier: str) -> float:
discounts = {"gold": 0.2, "silver": 0.1, "bronze": 0.05}
return price * discounts[tier]
# Test
try:
result = calculate_discount(100, "platinum")
print(f"Result: {result}")
except KeyError as e:
print(f"Error: KeyError for tier '{e}' not in discounts dict")
print(f"Available tiers: {list(discounts.keys())}")
Output:
Error: KeyError for tier 'platinum' not in discounts dict
Available tiers: ['gold', 'silver', 'bronze']
Now you can write a precise iteration prompt:
The generated function fails when tier is not in the discounts dict.
Test case:
Input: calculate_discount(100, "platinum")
Error: KeyError for 'platinum'
Expected: Should either raise ValueError with message "Unknown tier: platinum"
OR return the full price (no discount) for unknown tiers.
Please regenerate with:
1. Error handling for unknown tiers (raise ValueError with a message)
2. An example in the docstring showing this error case
3. A docstring listing supported tiers
Avoiding Infinite Iteration Loops
Sometimes you'll iterate more than necessary because of unclear feedback. Here's how to break out:
Check 1: Is my specification actually possible? If the AI keeps generating incorrect code despite clear feedback, the spec might be contradictory or impossible. Review the requirements for conflicts.
Check 2: Am I being too vague in feedback? "It doesn't work" is vague. Always include the exact error message and expected output.
Check 3: Should I rewrite the spec from scratch? If you've iterated more than 5 times, consider abandoning the current prompt and rewriting it more clearly. Often, a fresh spec is faster than continuing to iterate.
Check 4: Is the AI model the limiting factor? If you're using a smaller or older model, try a larger model (Claude 3.5 Sonnet, GPT-4 Turbo). Sometimes a stronger model solves the problem on the first try.
Key Takeaways
- Test generated code immediately; iteration feedback should include test case, actual output, and expected output.
- Provide concrete direction: "The formula is wrong; use X instead" not "fix the bug."
- Add failing test cases to your prompt's examples section so the AI model understands the requirement.
- Show test output to illustrate the failure (error messages, wrong results).
- If iterating more than 5 times, rewrite the specification from scratch instead.
Frequently Asked Questions
How many iterations is normal?
For simple functions: 0–1 (first try often correct). For complex functions: 1–3 iterations. For modules: 2–5 iterations (layer by layer). If you need more than 5, your spec likely needs rethinking.
Should I include the generated code in my iteration prompt?
Yes, especially if you're explaining what it does wrong. Paste the relevant part (3–10 lines) so the AI model sees exactly what needs fixing.
Can I iterate without showing test output?
You can, but it's less effective. Saying "the code is wrong" is vague. Test output (actual vs. expected) is specific and actionable.
What if the AI model keeps generating the same incorrect pattern?
Try a different approach in your prompt. If the AI keeps trying the same solution, it's trained on a pattern that doesn't fit your use case. Describe the desired approach more explicitly (e.g., "Use a loop" vs. "Use recursion").
How do I know when to give up and code it myself?
If you've iterated 5+ times and the code still isn't correct, you likely understand the solution well enough to code it yourself. At that point, it's faster to write a few lines by hand than to refine prompts further.