Skip to main content

Conflict Resolution in Agent Memory: Handling Contradictions

Memory systems accumulate contradictory information: a user says they prefer email on Monday and SMS on Friday; an agent makes a decision based on incomplete data, then later learns it was wrong. Production agents need conflict resolution: detect contradictions, decide which fact to trust, and merge conflicting records into a coherent state. Poor conflict handling leads to agent confusion and user frustration.

Detecting Memory Conflicts

Conflicts arise in several ways. A direct contradiction: "User prefers email (confidence 0.80)" vs. "User prefers SMS (confidence 0.60)." A logical inconsistency: "User name is Bob" and "User name is Robert" (same person, different names). An inconsistent outcome: "Agent sent report; success=true" vs. "User never received it; success_status=false."

The first step is detection. Naive approach: scan all memories for contradictions (expensive, O(n^2)). Better approach: maintain conflict indices.

# Example: Conflict detection for semantic memory
class ConflictDetector:
def __init__(self, memory_store):
self.memory_store = memory_store
self.conflicts = [] # List of detected conflicts

def check_for_conflicts(self, new_fact: dict) -> list:
"""
Check if a new fact contradicts any existing memory.
Return list of conflicting memories.
"""
conflicts = []

user_id = new_fact["user_id"]
predicate = new_fact["predicate"]
new_value = new_fact["value"]

# Find all existing memories with the same user + predicate
existing = self.memory_store.query(
user_id=user_id,
predicate=predicate
)

for existing_fact in existing:
# Check for contradiction
if existing_fact["value"] != new_value:
# Different values = potential conflict
conflict = {
"type": "value_mismatch",
"existing": existing_fact,
"new": new_fact,
"severity": self._compute_severity(existing_fact, new_fact)
}
conflicts.append(conflict)

return conflicts

def _compute_severity(self, fact1: dict, fact2: dict) -> str:
"""Estimate conflict severity: low, medium, high."""
# High severity: both high confidence and recent
conf1 = fact1.get("confidence", 0.5)
conf2 = fact2.get("confidence", 0.5)
time_diff_days = abs((fact1["timestamp"] - fact2["timestamp"]).days)

if conf1 > 0.80 and conf2 > 0.80 and time_diff_days > 7:
return "high" # Both confident, time divergence
elif conf1 > 0.60 and conf2 > 0.60:
return "medium"
else:
return "low" # Low confidence, easily resolved

Conflict Resolution Strategies

When conflicts are detected, choose a resolution strategy based on severity and domain:

1. Recency Strategy: Trust the Newest Fact

def resolve_by_recency(fact1: dict, fact2: dict) -> dict:
"""Keep the newer fact; mark old as archived."""
if fact1["timestamp"] > fact2["timestamp"]:
return fact1
else:
return fact2

Recency works when recent behavior is more reliable than stale data. Example: "User changed email preference from Gmail to Outlook." Risk: can flip-flop if the user is indecisive.

2. Confidence Strategy: Trust the Higher-Confidence Fact

def resolve_by_confidence(fact1: dict, fact2: dict) -> dict:
"""Keep the higher-confidence fact."""
conf1 = fact1.get("confidence", 0.5)
conf2 = fact2.get("confidence", 0.5)
return fact1 if conf1 >= conf2 else fact2

Confidence-based resolution favors well-supported facts. Risk: ignores recency; a long-established fact might be outdated.

3. Hybrid Strategy: Recency + Confidence

Weight both recency and confidence:

def resolve_hybrid(fact1: dict, fact2: dict, recency_weight=0.6, confidence_weight=0.4) -> dict:
"""
Weighted combination of recency and confidence.
fact1 and fact2 are {value, timestamp, confidence, ...}
"""
from datetime import datetime

now = datetime.now()

# Recency score: newer facts score higher
age1_days = (now - fact1["timestamp"]).days
age2_days = (now - fact2["timestamp"]).days

max_age = max(age1_days, age2_days)
recency_score1 = 1.0 - (age1_days / (max_age + 1))
recency_score2 = 1.0 - (age2_days / (max_age + 1))

# Confidence score: higher confidence scores higher
conf_score1 = fact1.get("confidence", 0.5)
conf_score2 = fact2.get("confidence", 0.5)

# Weighted sum
total_score1 = recency_weight * recency_score1 + confidence_weight * conf_score1
total_score2 = recency_weight * recency_score2 + confidence_weight * conf_score2

return fact1 if total_score1 >= total_score2 else fact2

4. Manual Review: Flag for Human Intervention

For high-severity conflicts, don't auto-resolve; flag for human review:

def resolve_high_severity_conflict(conflict: dict, escalation_queue):
"""
Flag a high-severity conflict for human review.
Log it and notify a human operator.
"""
escalation_queue.append({
"conflict_id": conflict.get("id"),
"user_id": conflict["existing"]["user_id"],
"predicate": conflict["existing"]["predicate"],
"fact1": conflict["existing"],
"fact2": conflict["new"],
"recommended_resolution": resolve_hybrid(conflict["existing"], conflict["new"]),
"timestamp": datetime.now().isoformat(),
"severity": "high"
})

# Send notification
send_alert(f"High-severity memory conflict for user {conflict['existing']['user_id']}")

Merging Conflicting Episodic Records

Episodic conflicts are trickier: two records describe the same event differently (e.g., two sources logged the same interaction). Merging strategy:

def merge_episodic_records(record1: dict, record2: dict) -> dict:
"""
Merge two episodic records of the same event.
Reconcile differences where possible.
"""
merged = {
"event_id": record1.get("event_id") or record2.get("event_id"),
"timestamp": record1.get("timestamp") or record2.get("timestamp"),
"user_id": record1.get("user_id") or record2.get("user_id"),
"sources": list(set([
record1.get("source", "unknown"),
record2.get("source", "unknown")
])) # Track which sources reported this
}

# For input/output, merge fields
merged["input_data"] = {**record1.get("input_data", {}), **record2.get("input_data", {})}
merged["output_data"] = {**record1.get("output_data", {}), **record2.get("output_data", {})}

# If conflicting outcomes, keep both and flag for review
outcome1 = record1.get("outcome")
outcome2 = record2.get("outcome")
if outcome1 != outcome2:
merged["outcome_conflict"] = {"record1_outcome": outcome1, "record2_outcome": outcome2}
merged["outcome"] = None # Unresolved
else:
merged["outcome"] = outcome1

return merged

Conflict Logging and Auditing

Log all conflicts for auditing and learning:

class ConflictLog:
def __init__(self, db):
self.db = db

def log_conflict(self, conflict: dict, resolution: str):
"""
Log a conflict and its resolution.
Helps identify patterns (e.g., which memory types conflict most often).
"""
self.db.insert("conflict_log", {
"conflict_id": conflict.get("id"),
"user_id": conflict.get("user_id"),
"predicate": conflict.get("predicate"),
"fact1": str(conflict.get("existing")),
"fact2": str(conflict.get("new")),
"resolution_strategy": resolution,
"timestamp": datetime.now().isoformat(),
"severity": conflict.get("severity")
})

def get_conflict_statistics(self):
"""Analyze conflict patterns to improve resolution strategies."""
stats = self.db.query("""
SELECT predicate, resolution_strategy, COUNT(*) as count
FROM conflict_log
GROUP BY predicate, resolution_strategy
ORDER BY count DESC
""")
return stats

Preventing Conflicts: Write Consistency

The best conflict resolution is prevention. Enforce write consistency:

  1. Single source of truth: When a fact is updated, clear conflicting versions.
def update_semantic_fact(memory_store, user_id: str, predicate: str, new_value: str, new_confidence: float):
"""
Update a semantic fact, clearing any conflicting old facts.
"""
# Delete or archive old facts with same (user, predicate) but different value
old_facts = memory_store.query(user_id=user_id, predicate=predicate)

for old in old_facts:
if old["value"] != new_value:
# Archive rather than delete (for auditing)
memory_store.archive(old["id"])

# Write new fact
memory_store.write({
"user_id": user_id,
"predicate": predicate,
"value": new_value,
"confidence": new_confidence,
"timestamp": datetime.now().isoformat()
})
  1. Optimistic locking: Use versions to detect and prevent concurrent updates.
class OptimisticLockMemory:
def __init__(self, db):
self.db = db

def read_with_version(self, memory_id: str):
"""Read a memory and its version number."""
record = self.db.query(f"SELECT * FROM memory WHERE id = ?", (memory_id,))
return record["value"], record["version"]

def write_if_unchanged(self, memory_id: str, new_value: str, expected_version: int):
"""Update only if the version hasn't changed (prevent lost updates)."""
rows_updated = self.db.execute(
"UPDATE memory SET value = ?, version = version + 1 WHERE id = ? AND version = ?",
(new_value, memory_id, expected_version)
).rowcount

if rows_updated == 0:
raise ConflictError(f"Memory {memory_id} was modified by another process")

return True

Key Takeaways

  • Detect conflicts by comparing new facts against existing memories with the same user and predicate.
  • Resolve conflicts using strategies: recency (newest wins), confidence (highest confidence wins), hybrid (weighted combination), or escalate for manual review.
  • Merge episodic records from multiple sources; track conflicting outcomes and flag for review.
  • Log all conflicts for auditing; analyze patterns to improve resolution strategies over time.
  • Prevent conflicts with single source of truth (clear old conflicting facts on write) and optimistic locking (detect concurrent modifications).

Frequently Asked Questions

What should I do when a conflict is unresolvable (both facts equally valid)?

Escalate to the user: "I have two conflicting records. Could you confirm which is correct?" Let the user settle the conflict. Update both facts with the user's confirmation and boost confidence in the correct one.

Should I keep conflicting facts or delete them?

Keep them (archive) for auditing and learning. Deleting conflicts loses information about the agent's mistakes. Periodically analyze conflict patterns to improve decision-making.

How often should conflicts occur in a healthy system?

Rare (< 1% of memory updates). If conflicts are frequent, the memory system needs tuning: decay rates may be wrong, or data sources may be unreliable. Monitor conflict rates as a health metric.

Can I use machine learning to predict which conflicting fact is correct?

Yes. Train a classifier on historical conflict logs: given two conflicting facts, predict which is correct based on features like confidence, recency, source, and predicate type. This automates conflict resolution.

What if the user is genuinely changing their preference (not a conflict, just evolution)?

Treat it as a new fact with moderate confidence, and let the old fact decay naturally. Don't force a conflict; let time and reinforcement decide which preference is current.

Further Reading