Debugging PMCR-O Loops: Common Pitfalls and Fixes
PMCR-O agents operate in strange loopsโself-referential cycles that can spiral into infinite recursion, deadlock, or silent failures. This guide covers the most common debugging scenarios you'll encounter in production.
1. The Infinite Loop: Planner Never Completes
โ Symptom
- Planner agent runs indefinitely
- No response from gRPC service
- OpenTelemetry shows continuous activity
- Ollama logs show repeated requests
Root Causes
A. Missing Timeout Configuration
Ollama requests can hang indefinitely if timeouts aren't configured. Check your HttpClient setup:
// โ BAD: No timeout
builder.Services.AddHttpClient("ollama", client =>
{
client.BaseAddress = uri;
// Missing: client.Timeout = ...
});
// โ
GOOD: Infinite timeout, let resilience handler control it
builder.Services.AddHttpClient("ollama", client =>
{
client.BaseAddress = uri;
client.Timeout = Timeout.InfiniteTimeSpan; // Let resilience handler manage
})
.AddStandardResilienceHandler(options =>
{
options.AttemptTimeout.Timeout = TimeSpan.FromMinutes(3);
options.TotalRequestTimeout.Timeout = TimeSpan.FromMinutes(5);
options.Retry.MaxRetryAttempts = 2;
});
B. Missing UseFunctionInvocation Middleware
If your IChatClient isn't configured with function invocation, tool calls will hang:
// โ BAD: Missing UseFunctionInvocation
builder.Services.AddSingleton<IChatClient>(sp =>
{
var httpClient = sp.GetRequiredService<IHttpClientFactory>().CreateClient("ollama");
var baseClient = new OllamaApiClient(httpClient, modelId);
return baseClient; // Missing middleware!
});
// โ
GOOD: With function invocation
builder.Services.AddSingleton<IChatClient>(sp =>
{
var httpClient = sp.GetRequiredService<IHttpClientFactory>().CreateClient("ollama");
var baseClient = new OllamaApiClient(httpClient, modelId);
return new ChatClientBuilder(baseClient)
.UseFunctionInvocation() // โ
Critical middleware
.Build();
});
โ Fix
- Add timeout configuration to HttpClient
- Ensure UseFunctionInvocation middleware is registered
- Add circuit breaker to prevent cascading failures
- Monitor OpenTelemetry traces for stuck requests
2. The Deadlock: Agents Waiting for Each Other
โ Symptom
- Orchestrator calls Planner, then hangs
- Planner waits for Maker, Maker waits for Checker
- gRPC calls timeout after 5 minutes
- No error logs, just silence
Root Cause: Synchronous gRPC Calls in Async Context
If you're calling gRPC services synchronously or with blocking waits, you'll deadlock:
// โ BAD: Blocking wait
var plannerResponse = plannerClient.ExecuteTask(request).ResponseAsync.Result; // Deadlock!
// โ
GOOD: Fully async
var plannerResponse = await plannerClient.ExecuteTaskAsync(request, cancellationToken);
Root Cause: Missing Cancellation Tokens
Without cancellation tokens, long-running operations can't be aborted:
// โ BAD: No cancellation token
public override async Task<AgentResponse> ExecuteTask(
AgentRequest request,
ServerCallContext context)
{
var response = await _chatClient.CompleteChatAsync(history); // Can't cancel!
}
// โ
GOOD: With cancellation
public override async Task<AgentResponse> ExecuteTask(
AgentRequest request,
ServerCallContext context)
{
var cancellationToken = context.CancellationToken;
var response = await _chatClient.CompleteChatAsync(history, cancellationToken);
}
โ Fix
- Always use
await, never.Resultor.Wait() - Pass
context.CancellationTokento all async operations - Configure gRPC deadline:
context.Deadline - Use
Task.WhenAllfor parallel agent calls, not sequential
3. The Silent Failure: Agent Returns Empty Response
โ Symptom
- Agent returns
Success = truebutContent = "" - No exceptions logged
- Ollama returns valid response, but agent doesn't process it
Root Cause: Missing Null Checks
// โ BAD: No null check
var response = await agent.RunAsync(message, thread);
var planContent = response.Text; // Can be null!
return new AgentResponse
{
Content = planContent, // Empty string if null
Success = true
};
// โ
GOOD: Defensive null handling
var response = await agent.RunAsync(message, thread);
var planContent = response.Text ?? "[No plan generated]";
if (string.IsNullOrWhiteSpace(planContent))
{
_logger.LogWarning("Agent returned empty response for intent: {Intent}", request.Intent);
return new AgentResponse
{
Content = "Agent failed to generate response. Check logs for details.",
Success = false
};
}
return new AgentResponse
{
Content = planContent,
Success = true
};
โ Fix
- Always check
response.Textfor null/empty - Log warnings when responses are empty
- Return
Success = falsefor empty responses - Add validation in Orchestrator before passing to next phase
4. The Strange Loop: Infinite Reflection
โ Symptom
- Reflector keeps reflecting on its own reflections
- Knowledge vault fills with duplicate entries
- Agent performance degrades over time
- No exit condition from reflection loop
Root Cause: Missing Loop Detection
// โ BAD: No loop detection
public override async Task<AgentResponse> ExecuteTask(
AgentRequest request,
ServerCallContext context)
{
var reflection = await Reflect(request.Intent);
var deeperReflection = await Reflect(reflection);
var evenDeeper = await Reflect(deeperReflection); // Infinite!
}
// โ
GOOD: With loop detection
private readonly HashSet<string> _processedIntents = new();
private const int MaxReflectionDepth = 3;
public override async Task<AgentResponse> ExecuteTask(
AgentRequest request,
ServerCallContext context)
{
var intentHash = HashIntent(request.Intent);
if (_processedIntents.Contains(intentHash))
{
_logger.LogWarning("Detected reflection loop for intent: {Intent}", request.Intent);
return new AgentResponse
{
Content = "Reflection loop detected. Stopping to prevent infinite recursion.",
Success = false
};
}
_processedIntents.Add(intentHash);
try
{
var reflection = await ReflectWithDepth(request.Intent, MaxReflectionDepth);
return new AgentResponse { Content = reflection, Success = true };
}
finally
{
_processedIntents.Remove(intentHash); // Clean up after processing
}
}
private string HashIntent(string intent) =>
System.Security.Cryptography.SHA256.HashData(
System.Text.Encoding.UTF8.GetBytes(intent))
.Select(b => b.ToString("x2"))
.Aggregate((a, b) => a + b);
โ Fix
- Track processed intents with hash set
- Set maximum reflection depth (e.g., 3 levels)
- Log warnings when loops are detected
- Clean up tracking after processing completes
5. The Connection Error: Ollama Unreachable
โ Symptom
HttpRequestException: Connection refused- All agents fail with same error
- Ollama service is running but agents can't connect
Root Cause: Incorrect Connection String
// โ BAD: Hardcoded localhost
var ollamaUri = "http://localhost:11434";
// โ
GOOD: From Aspire connection string
var ollamaUri = builder.Configuration.GetConnectionString("ollama")
?? "http://localhost:11434";
if (!Uri.TryCreate(ollamaUri, UriKind.Absolute, out var uri))
{
_logger.LogError("Invalid Ollama URI: {Uri}", ollamaUri);
uri = new Uri("http://localhost:11434"); // Fallback
}
// Verify connection on startup
builder.Services.AddHostedService<OllamaHealthCheckService>();
Root Cause: Missing Service Discovery
In Aspire, services discover each other via connection strings. Ensure Ollama is registered:
// In AppHost.cs
var ollama = builder.AddOllama("ollama", port: 11434)
.WithDataVolume()
.WithLifetime(ContainerLifetime.Persistent);
// Services reference Ollama
var planner = builder.AddProject<Projects.ProjectName_PlannerService>("planner-agent")
.WithReference(ollama) // โ
This creates connection string
.WaitFor(llama);
โ Fix
- Use Aspire connection strings, not hardcoded URLs
- Verify URI parsing with
Uri.TryCreate - Add health check service to verify Ollama on startup
- Use circuit breaker to fail fast when Ollama is down
6. The JSON Parsing Error: Structured Output Fails
โ Symptom
JsonException: The JSON value could not be converted- Agent returns valid text but JSON parsing fails
- Ollama returns JSON but with extra text/markdown
Root Cause: Missing ResponseFormat Configuration
// โ BAD: No response format
var chatOptions = new ChatOptions();
var response = await _chatClient.CompleteChatAsync(history, chatOptions);
var plan = JsonSerializer.Deserialize<PlanResponse>(response.Content); // Fails!
// โ
GOOD: With JSON response format
var chatOptions = new ChatOptions
{
ResponseFormat = ChatResponseFormat.Json, // โ
Forces JSON output
AdditionalProperties = new Dictionary<string, object?>
{
["schema"] = JsonSerializer.Serialize(new
{
type = "object",
properties = new
{
plan = new { type = "string" },
steps = new { type = "array" }
},
required = new[] { "plan", "steps" }
})
}
};
var response = await _chatClient.CompleteChatAsync(history, chatOptions);
var plan = JsonSerializer.Deserialize<PlanResponse>(response.Content); // Works!
Root Cause: Extra Markdown Wrapping
Even with ResponseFormat.Json, some models wrap JSON in markdown:
// โ
GOOD: Strip markdown code blocks
private T ParseStructuredOutput<T>(string content)
{
// Remove markdown code blocks if present
content = System.Text.RegularExpressions.Regex.Replace(
content,
@"```json\s*|\s*```",
"",
System.Text.RegularExpressions.RegexOptions.IgnoreCase);
content = content.Trim();
try
{
return JsonSerializer.Deserialize<T>(content)
?? throw new InvalidOperationException("Failed to deserialize JSON");
}
catch (JsonException ex)
{
_logger.LogError(ex, "Failed to parse JSON: {Content}", content);
throw;
}
}
โ Fix
- Always set
ResponseFormat = ChatResponseFormat.Json - Provide JSON schema in
AdditionalProperties - Strip markdown code blocks before parsing
- Wrap parsing in try-catch with detailed error logs
7. The Memory Leak: Knowledge Vault Grows Unbounded
โ Symptom
- PostgreSQL database grows continuously
- Vector search becomes slow
- Memory usage increases over time
- No cleanup of old cognitive trails
Root Cause: No Retention Policy
// โ
GOOD: Background service to clean old entries
public class KnowledgeVaultCleanupService : BackgroundService
{
private readonly KnowledgeDbContext _db;
private readonly ILogger<KnowledgeVaultCleanupService> _logger;
private readonly TimeSpan _retentionPeriod = TimeSpan.FromDays(90);
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
while (!stoppingToken.IsCancellationRequested)
{
try
{
var cutoffDate = DateTime.UtcNow - _retentionPeriod;
var deleted = await _db.KnowledgeEntries
.Where(e => e.CreatedAt < cutoffDate)
.ExecuteDeleteAsync(stoppingToken);
if (deleted > 0)
{
_logger.LogInformation("Cleaned up {Count} old knowledge entries", deleted);
}
// Run cleanup daily
await Task.Delay(TimeSpan.FromDays(1), stoppingToken);
}
catch (Exception ex)
{
_logger.LogError(ex, "Error during knowledge vault cleanup");
await Task.Delay(TimeSpan.FromHours(1), stoppingToken);
}
}
}
}
โ Fix
- Implement retention policy (e.g., 90 days)
- Create background service to clean old entries
- Archive important trails before deletion
- Monitor database size and query performance
Debugging Tools & Techniques
1. OpenTelemetry Tracing
Use Aspire dashboard to trace agent calls:
- View gRPC call latency
- Identify slow agents
- Track request flow through PMCR-O phases
2. Structured Logging
// โ
GOOD: Structured logging with context
_logger.LogInformation(
"๐งญ I am the Planner. I am analyzing the intent: {Intent}",
request.Intent);
_logger.LogError(
ex,
"โ Planner failed for intent: {Intent}. Error: {Error}",
request.Intent,
ex.Message);
3. Health Checks
Add health checks for each agent service:
builder.Services.AddHealthChecks()
.AddCheck("ollama", () =>
{
// Verify Ollama is reachable
using var client = new HttpClient();
var response = client.GetAsync("http://localhost:11434/api/tags").Result;
return response.IsSuccessStatusCode
? HealthCheckResult.Healthy()
: HealthCheckResult.Unhealthy("Ollama unreachable");
})
.AddCheck("postgres", () =>
{
// Verify PostgreSQL connection
// ...
});
Prevention Checklist
โ Before Deploying to Production
- โ Configure HttpClient timeouts and resilience handlers
- โ
Add
UseFunctionInvocationmiddleware - โ Pass cancellation tokens to all async operations
- โ Implement loop detection for Reflector
- โ Add null checks for agent responses
- โ Use Aspire connection strings, not hardcoded URLs
- โ
Configure
ResponseFormat.Jsonfor structured output - โ Implement knowledge vault retention policy
- โ Add health checks for all dependencies
- โ Enable OpenTelemetry tracing
2026 AI-Assisted Debugging Tools
Modern debugging tools in 2026 leverage AI to automatically detect and fix PMCR-O loop issues. Here are the cutting-edge tools available:
1. Ollama-Powered Debug Agent
Use an AI agent to analyze your PMCR-O logs and suggest fixes:
// AI-Assisted Debug Agent
public class AIDebugAgent
{
private readonly IChatClient _chatClient;
public async Task<DebugRecommendation> AnalyzeLogsAsync(
string logContent,
CancellationToken cancellationToken = default)
{
var prompt = $@"I AM the PMCR-O Debug Agent.
I ANALYZE logs to identify loop failures.
I SUGGEST specific fixes with code examples.
Analyze these PMCR-O logs and identify issues:
{logContent}
Output JSON with:
- issue_type: [infinite_loop | deadlock | silent_failure | connection_error | json_parsing | memory_leak]
- root_cause: string
- fix_code: string (C# code example)
- prevention: string[]";
var chatOptions = new ChatOptions
{
ResponseFormat = ChatResponseFormat.Json,
Temperature = 0.1 // Low temperature for precise analysis
};
var response = await _chatClient.CompleteChatAsync(
new ChatHistory { new ChatMessage(ChatRole.User, prompt) },
chatOptions,
cancellationToken);
return JsonSerializer.Deserialize<DebugRecommendation>(response.Content)
?? throw new InvalidOperationException("Failed to parse debug recommendation");
}
}
2. Real-Time Loop Detection with OpenTelemetry
Use OpenTelemetry traces to automatically detect infinite loops:
// Automatic loop detection via OpenTelemetry
public class LoopDetectionMiddleware
{
private readonly ILogger<LoopDetectionMiddleware> _logger;
private readonly Dictionary<string, int> _traceCounts = new();
public void OnTraceStart(string traceId, string operationName)
{
var key = $"{traceId}:{operationName}";
_traceCounts.TryGetValue(key, out var count);
_traceCounts[key] = count + 1;
if (count + 1 > 10) // Threshold for loop detection
{
_logger.LogWarning(
"Potential infinite loop detected: {Operation} in trace {TraceId}",
operationName, traceId);
// Trigger alert or circuit breaker
throw new InfiniteLoopException(
$"Operation {operationName} executed {count + 1} times in trace {traceId}");
}
}
}
3. .NET 11 Preview: Enhanced Diagnostics
.NET 11 introduces improved diagnostics for async operations and gRPC:
- Async Stack Traces: Full async call stacks showing where operations hang
- gRPC Deadlock Detection: Automatic detection of blocking gRPC calls
- Memory Profiling: Built-in memory leak detection for knowledge vaults
- AI-Powered Suggestions: IDE integration that suggests fixes based on error patterns
// .NET 11: Enhanced async diagnostics
// Enable in appsettings.json:
{
"Diagnostics": {
"AsyncStackTrace": true,
"GrpcDeadlockDetection": true,
"MemoryProfiling": {
"Enabled": true,
"Threshold": "100MB"
}
}
}
// Automatic deadlock detection
// .NET 11 will throw GrpcDeadlockException if blocking detected
try
{
var response = await plannerClient.ExecuteTaskAsync(request);
}
catch (GrpcDeadlockException ex)
{
// .NET 11 provides detailed stack trace showing where deadlock occurred
_logger.LogError(ex, "gRPC deadlock detected: {StackTrace}", ex.AsyncStackTrace);
}
4. Visual Studio 2026: AI Debug Assistant
Visual Studio 2026 includes an AI-powered debug assistant that:
- Analyzes breakpoints and suggests where to add more
- Identifies common PMCR-O patterns (infinite loops, deadlocks)
- Generates fix suggestions with code examples
- Learns from your debugging patterns to improve suggestions
๐ง 2026 Debugging Workflow
- Enable AI Debug Agent: Let AI analyze logs automatically
- Use OpenTelemetry: Real-time loop detection in production
- Leverage .NET 11 Diagnostics: Built-in async/gRPC debugging
- IDE Integration: Visual Studio 2026 AI assistant for interactive debugging
Conclusion
PMCR-O loops fail in predictable patterns. Most issues stem from:
- Missing timeouts โ Infinite loops
- Blocking async calls โ Deadlocks
- Missing null checks โ Silent failures
- No loop detection โ Infinite reflection
- Hardcoded URLs โ Connection errors
- Missing JSON format โ Parsing errors
- No retention policy โ Memory leaks
Follow the patterns in this guide, and your PMCR-O system will be production-ready.
๐ Related Resources:
- PMCR-O Quickstart - Build the foundation
- Building Self-Referential Agents Part 2 - Multi-agent patterns
- PMCR-O Codex - Framework architecture
- PMCR-O and pgvector - Knowledge vault setup