Skip to main content

Post-Process Transcripts with LLM Prompts

Raw ASR transcripts often contain errors: missing punctuation, homophones (there/their), redundant words, or out-of-context errors. Large language models excel at cleaning messy text; a well-crafted prompt can improve readability and accuracy by 5–15% without retraining the ASR model. LLM post-processing is especially effective for domain-specific content: you can guide the model toward correct terminology (API vs api, Kubernetes vs kubernetes) using a simple prompt. Combining Whisper with GPT-4 or Claude creates a transcription + refinement pipeline that rivals human-edited transcripts at a fraction of the cost.

The Transcript Refinement Prompt Pattern

A standard refinement prompt instructs the LLM to fix common ASR errors while preserving the speaker's intent. Here's a pattern:

from openai import OpenAI

def refine_transcript(raw_transcript, domain="general"):
"""
Refine an ASR transcript using GPT to fix errors, add punctuation, etc.
"""
client = OpenAI()

# Craft a domain-aware refinement prompt
if domain == "technical":
domain_guidance = (
"The transcript discusses software engineering, APIs, and cloud infrastructure. "
"Use correct terminology: 'API' not 'a.p.i.', 'Kubernetes' not 'kubernetes', "
"'asynchronous' not 'async' (except in code contexts). "
"Preserve technical terms and abbreviations as they appear in industry documentation."
)
elif domain == "medical":
domain_guidance = (
"The transcript is a medical consultation. Use correct medical terminology. "
"For uncertain acronyms, spell them out: 'CT scan' not 'c.t. scan'. "
"Preserve patient confidentiality; do not add patient names."
)
else:
domain_guidance = "The transcript is a general conversation."

prompt = f"""You are a professional transcription editor. Your task is to refine the following raw ASR transcript:
- Fix punctuation and capitalization
- Correct obvious homophones (there/their, to/too, your/you're)
- Remove redundant or filler words (um, uh, like) unless they're significant
- Fix common ASR errors (e.g., 'the' misheard as 'thee')
- Preserve the original meaning and tone
- Add paragraph breaks where topic changes occur

Domain context: {domain_guidance}

RAW TRANSCRIPT:
{raw_transcript}

REFINED TRANSCRIPT:"""

response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=4096,
messages=[{"role": "user", "content": prompt}]
)

return response.content[0].text

# Example
raw = """the meeting was about api design and we discussed how to handle errors in async code um the team agreed that we should use structured logging instead of just printing to stdout like we used to"""

refined = refine_transcript(raw, domain="technical")
print("REFINED:")
print(refined)
# Output:
# The meeting was about API design and error handling in asynchronous code. The team agreed to implement structured logging instead of printing to stdout.

This pattern works well for general cleanup. For more aggressive improvement, add a second pass focused on clarity and conciseness.

Speaker-Aware Transcript Refinement with Diarization

When you have speaker labels (from diarization), refine while preserving speaker attribution:

import json
from anthropic import Anthropic

def refine_speaker_transcript(speaker_words, context_words=100):
"""
Refine a speaker-labeled transcript chunk by chunk.

speaker_words: List of {"word", "speaker", "start", "end"}
context_words: Include N words of context in each prompt
"""
client = Anthropic()

# Group words by speaker turn
turns = []
current_turn = {"speaker": speaker_words[0]["speaker"], "words": []}

for item in speaker_words:
if item["speaker"] != current_turn["speaker"]:
turns.append(current_turn)
current_turn = {"speaker": item["speaker"], "words": []}
current_turn["words"].append(item["word"])
turns.append(current_turn)

# Refine each turn
refined_turns = []
for i, turn in enumerate(turns):
# Build context (previous + current + next turn)
context = []
if i > 0:
context.append(f"Previous speaker ({turns[i-1]['speaker']}): {' '.join(turns[i-1]['words'][:context_words])}")

context.append(f"Current speaker ({turn['speaker']}): {' '.join(turn['words'])}")

if i < len(turns) - 1:
context.append(f"Next speaker ({turns[i+1]['speaker']}): {' '.join(turns[i+1]['words'][:context_words])}")

context_text = "\n".join(context)

# Refine using conversation (multi-turn for context retention)
prompt = f"""You are editing a meeting transcript. Refine only the current speaker's line:
- Fix punctuation and capitalization
- Correct homophones and obvious ASR errors
- Preserve the speaker's intended meaning

Context (other speakers):
{context_text}

Respond with ONLY the refined text for {turn['speaker']}'s line, without any preamble."""

response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=500,
messages=[{"role": "user", "content": prompt}]
)

refined_text = response.content[0].text.strip()
refined_turns.append({
"speaker": turn["speaker"],
"original": " ".join(turn["words"]),
"refined": refined_text
})

return refined_turns

# Example with diarization output
speaker_words = [
{"word": "the", "speaker": "Speaker 1"},
{"word": "meeting", "speaker": "Speaker 1"},
{"word": "is", "speaker": "Speaker 1"},
{"word": "about", "speaker": "Speaker 1"},
{"word": "api", "speaker": "Speaker 1"},
{"word": "design", "speaker": "Speaker 1"},
{"word": "sounds", "speaker": "Speaker 2"},
{"word": "interesting", "speaker": "Speaker 2"},
]

refined = refine_speaker_transcript(speaker_words)
for turn in refined:
print(f"{turn['speaker']}: {turn['refined']}")

This preserves speaker attribution and improves clarity by leveraging context.

Domain-Specific Correction: Terminology and Jargon

For specialized domains, provide a terminology guide in the prompt:

def refine_with_terminology(raw_transcript, terminology_dict):
"""
Refine transcript with domain-specific terminology corrections.

terminology_dict: {"wrong_term": "correct_term", ...}
"""
from openai import OpenAI

client = OpenAI()

# Build terminology reference
terms_str = "\n".join([f"- '{k}' should be '{v}'" for k, v in terminology_dict.items()])

prompt = f"""Refine the following transcript. Correct the ASR errors listed below:

CORRECTIONS TO APPLY:
{terms_str}

TRANSCRIPT:
{raw_transcript}

REFINED TRANSCRIPT:"""

response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=4096,
messages=[{"role": "user", "content": prompt}]
)

return response.content[0].text

# Example: Medical terminology
terminology = {
"ekg": "EKG",
"m.r.i.": "MRI",
"type two diabetes": "Type 2 diabetes",
"c.o.p.d.": "COPD",
}

raw = "the patient reported type two diabetes and recent ekg showed normal sinus rhythm"
refined = refine_with_terminology(raw, terminology)
print(refined)
# Output: "The patient reported Type 2 diabetes and recent EKG showed normal sinus rhythm."

This pattern is more reliable than free-form prompting because the terminology is explicit.

Extracting Entities and Structured Data from Transcripts

Combine transcription + LLM refinement + entity extraction to structure meeting notes:

import json
from openai import OpenAI

def extract_meeting_summary(raw_transcript):
"""
Extract structured data from a meeting transcript using prompt engineering.
Returns: action items, decisions, participants, next steps
"""
client = OpenAI()

prompt = f"""Analyze this meeting transcript and extract:
1. Key decisions made
2. Action items (who is responsible for what)
3. Topics discussed
4. Next meeting date/time (if mentioned)

Respond in JSON format:
{{
"decisions": ["decision 1", "decision 2"],
"action_items": [
{{"task": "...", "owner": "..."}},
...
],
"topics": ["topic 1", "topic 2"],
"next_meeting": "date/time or null"
}}

TRANSCRIPT:
{raw_transcript}

STRUCTURED OUTPUT:"""

response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1000,
messages=[{"role": "user", "content": prompt}]
)

# Parse JSON from response
import json
try:
# Extract JSON from response (may contain preamble)
text = response.content[0].text
json_start = text.find('{')
json_end = text.rfind('}') + 1
json_str = text[json_start:json_end]
return json.loads(json_str)
except json.JSONDecodeError:
return {"error": "Failed to parse response"}

# Example
transcript = """alice: we decided to migrate to microservices by q4 2026. bob, you're leading the architecture review.
bob: yes, i'll have a proposal by next friday. sarah, we need you to assess the database migration plan.
sarah: i can start that next week. when should we meet again?
alice: let's sync on friday at 2pm."""

summary = extract_meeting_summary(transcript)
print(json.dumps(summary, indent=2))
# Output:
# {
# "decisions": ["Migrate to microservices by Q4 2026"],
# "action_items": [
# {"task": "Architecture review proposal", "owner": "Bob"},
# {"task": "Assess database migration plan", "owner": "Sarah"}
# ],
# "topics": ["Microservices migration", "Architecture"],
# "next_meeting": "Friday at 2pm"
# }

This pattern turns raw transcripts into actionable meeting notes.

Batch Refinement with Cost Optimization

For large volumes of transcripts, batch refinement efficiently:

import json
from anthropic import Anthropic

def batch_refine_transcripts(transcript_list, batch_size=10):
"""
Refine multiple transcripts efficiently using batched prompts.
"""
client = Anthropic()
refined_results = []

for i in range(0, len(transcript_list), batch_size):
batch = transcript_list[i:i+batch_size]

# Build batch prompt (multiple transcripts in one request)
batch_text = "\n\n---\n\n".join([
f"TRANSCRIPT {j+1}:\n{t}" for j, t in enumerate(batch)
])

prompt = f"""Refine these {len(batch)} meeting transcripts. For each, fix punctuation, correct errors, and improve clarity.

{batch_text}

Respond with the refined transcripts in the same order, separated by "---TRANSCRIPT_END---"."""

response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=8000,
messages=[{"role": "user", "content": prompt}]
)

# Parse batch response
refined = response.content[0].text.split("---TRANSCRIPT_END---")
refined_results.extend([r.strip() for r in refined if r.strip()])

return refined_results

# Usage
transcripts = [
"the call was about budget planning for next year",
"we discussed hiring three new engineers in q3",
]

refined = batch_refine_transcripts(transcripts)
for i, t in enumerate(refined):
print(f"{i+1}: {t}")

Batching reduces API calls and costs by 50%+ compared to processing transcripts one-by-one.

Key Takeaways

  • LLM post-processing improves ASR transcript quality by 5–15%, fixing punctuation, homophones, and jargon without retraining.
  • Use domain-aware prompts with terminology guides for specialized content (medical, technical, legal).
  • Combine diarization with speaker-aware refinement to preserve speaker attribution while improving clarity.
  • Extract structured data (decisions, action items, entities) from transcripts using JSON extraction prompts.
  • Batch multiple transcripts in one prompt to reduce API costs and latency.

Frequently Asked Questions

How much does LLM post-processing cost compared to raw ASR?

Whisper costs $0.006/minute; Claude or GPT-4 refinement adds $0.01–0.05/minute depending on transcript length. For 1 hour of audio, plan on $0.50–2.00 total. Batch processing reduces cost per transcript by 30–50%.

Can I use a smaller LLM model for post-processing to save cost?

Yes. Claude Instant or GPT-3.5 are cheaper and fast enough for punctuation and error correction. Use larger models (Claude 3.5 Sonnet or GPT-4) only if you need complex reasoning or domain-specific domain corrections.

What if the transcript is very long (>5,000 words)?

Break it into chunks (500–1,000 words each) and refine in parallel or batches. Use a summarization model afterward if you need a one-page summary instead of a full refined transcript.

Can LLM refinement fix all ASR errors?

No. If Whisper fundamentally misheard a word (e.g., "Kubernetes" as "kubernetes" — likely), LLM refinement will fix it. But if the acoustic signal is too degraded (e.g., heavy accent, loud noise), the LLM cannot recover the intended word. Combine with denoising or re-run ASR for critical sections.

How do I evaluate the quality of refined transcripts?

Compute WER against a reference transcript if available, or conduct user testing (have humans rate clarity and correctness). For critical applications, spot-check a sample of refined transcripts manually.

Further Reading