Debugging PMCR-O Loops: Common Pitfalls and Fixes

PMCR-O agents operate in strange loops—self-referential cycles that can spiral into infinite recursion, deadlock, or silent failures. This guide covers the most common debugging scenarios you'll encounter in production.

🎯 Debugging Philosophy: PMCR-O loops fail in predictable patterns. Once you recognize the pattern, the fix is usually straightforward. This guide maps symptoms to solutions.

1. The Infinite Loop: Planner Never Completes

❌ Symptom

Planner agent runs indefinitely
No response from gRPC service
OpenTelemetry shows continuous activity
Ollama logs show repeated requests

Root Causes

A. Missing Timeout Configuration

Ollama requests can hang indefinitely if timeouts aren't configured. Check your HttpClient setup:

// ❌ BAD: No timeout
builder.Services.AddHttpClient("ollama", client =>
{
    client.BaseAddress = uri;
    // Missing: client.Timeout = ...
});

// ✅ GOOD: Infinite timeout, let resilience handler control it
builder.Services.AddHttpClient("ollama", client =>
{
    client.BaseAddress = uri;
    client.Timeout = Timeout.InfiniteTimeSpan; // Let resilience handler manage
})
.AddStandardResilienceHandler(options =>
{
    options.AttemptTimeout.Timeout = TimeSpan.FromMinutes(3);
    options.TotalRequestTimeout.Timeout = TimeSpan.FromMinutes(5);
    options.Retry.MaxRetryAttempts = 2;
});

B. Missing UseFunctionInvocation Middleware

If your IChatClient isn't configured with function invocation, tool calls will hang:

// ❌ BAD: Missing UseFunctionInvocation
builder.Services.AddSingleton<IChatClient>(sp =>
{
    var httpClient = sp.GetRequiredService<IHttpClientFactory>().CreateClient("ollama");
    var baseClient = new OllamaApiClient(httpClient, modelId);
    return baseClient; // Missing middleware!
});

// ✅ GOOD: With function invocation
builder.Services.AddSingleton<IChatClient>(sp =>
{
    var httpClient = sp.GetRequiredService<IHttpClientFactory>().CreateClient("ollama");
    var baseClient = new OllamaApiClient(httpClient, modelId);
    
    return new ChatClientBuilder(baseClient)
        .UseFunctionInvocation() // ✅ Critical middleware
        .Build();
});

✅ Fix

Add timeout configuration to HttpClient
Ensure UseFunctionInvocation middleware is registered
Add circuit breaker to prevent cascading failures
Monitor OpenTelemetry traces for stuck requests

2. The Deadlock: Agents Waiting for Each Other

❌ Symptom

Orchestrator calls Planner, then hangs
Planner waits for Maker, Maker waits for Checker
gRPC calls timeout after 5 minutes
No error logs, just silence

Root Cause: Synchronous gRPC Calls in Async Context

If you're calling gRPC services synchronously or with blocking waits, you'll deadlock:

// ❌ BAD: Blocking wait
var plannerResponse = plannerClient.ExecuteTask(request).ResponseAsync.Result; // Deadlock!

// ✅ GOOD: Fully async
var plannerResponse = await plannerClient.ExecuteTaskAsync(request, cancellationToken);

Root Cause: Missing Cancellation Tokens

Without cancellation tokens, long-running operations can't be aborted:

// ❌ BAD: No cancellation token
public override async Task<AgentResponse> ExecuteTask(
    AgentRequest request,
    ServerCallContext context)
{
    var response = await _chatClient.CompleteChatAsync(history); // Can't cancel!
}

// ✅ GOOD: With cancellation
public override async Task<AgentResponse> ExecuteTask(
    AgentRequest request,
    ServerCallContext context)
{
    var cancellationToken = context.CancellationToken;
    var response = await _chatClient.CompleteChatAsync(history, cancellationToken);
}

✅ Fix

Always use await, never .Result or .Wait()
Pass context.CancellationToken to all async operations
Configure gRPC deadline: context.Deadline
Use Task.WhenAll for parallel agent calls, not sequential

3. The Silent Failure: Agent Returns Empty Response

❌ Symptom

Agent returns Success = true but Content = ""
No exceptions logged
Ollama returns valid response, but agent doesn't process it

Root Cause: Missing Null Checks

// ❌ BAD: No null check
var response = await agent.RunAsync(message, thread);
var planContent = response.Text; // Can be null!

return new AgentResponse
{
    Content = planContent, // Empty string if null
    Success = true
};

// ✅ GOOD: Defensive null handling
var response = await agent.RunAsync(message, thread);
var planContent = response.Text ?? "[No plan generated]";

if (string.IsNullOrWhiteSpace(planContent))
{
    _logger.LogWarning("Agent returned empty response for intent: {Intent}", request.Intent);
    return new AgentResponse
    {
        Content = "Agent failed to generate response. Check logs for details.",
        Success = false
    };
}

return new AgentResponse
{
    Content = planContent,
    Success = true
};

✅ Fix

Always check response.Text for null/empty
Log warnings when responses are empty
Return Success = false for empty responses
Add validation in Orchestrator before passing to next phase

4. The Strange Loop: Infinite Reflection

❌ Symptom

Reflector keeps reflecting on its own reflections
Knowledge vault fills with duplicate entries
Agent performance degrades over time
No exit condition from reflection loop

Root Cause: Missing Loop Detection

// ❌ BAD: No loop detection
public override async Task<AgentResponse> ExecuteTask(
    AgentRequest request,
    ServerCallContext context)
{
    var reflection = await Reflect(request.Intent);
    var deeperReflection = await Reflect(reflection);
    var evenDeeper = await Reflect(deeperReflection); // Infinite!
}

// ✅ GOOD: With loop detection
private readonly HashSet<string> _processedIntents = new();
private const int MaxReflectionDepth = 3;

public override async Task<AgentResponse> ExecuteTask(
    AgentRequest request,
    ServerCallContext context)
{
    var intentHash = HashIntent(request.Intent);
    
    if (_processedIntents.Contains(intentHash))
    {
        _logger.LogWarning("Detected reflection loop for intent: {Intent}", request.Intent);
        return new AgentResponse
        {
            Content = "Reflection loop detected. Stopping to prevent infinite recursion.",
            Success = false
        };
    }

    _processedIntents.Add(intentHash);
    
    try
    {
        var reflection = await ReflectWithDepth(request.Intent, MaxReflectionDepth);
        return new AgentResponse { Content = reflection, Success = true };
    }
    finally
    {
        _processedIntents.Remove(intentHash); // Clean up after processing
    }
}

private string HashIntent(string intent) => 
    System.Security.Cryptography.SHA256.HashData(
        System.Text.Encoding.UTF8.GetBytes(intent))
    .Select(b => b.ToString("x2"))
    .Aggregate((a, b) => a + b);

✅ Fix

Track processed intents with hash set
Set maximum reflection depth (e.g., 3 levels)
Log warnings when loops are detected
Clean up tracking after processing completes

5. The Connection Error: Ollama Unreachable

❌ Symptom

HttpRequestException: Connection refused
All agents fail with same error
Ollama service is running but agents can't connect

Root Cause: Incorrect Connection String

// ❌ BAD: Hardcoded localhost
var ollamaUri = "http://localhost:11434";

// ✅ GOOD: From Aspire connection string
var ollamaUri = builder.Configuration.GetConnectionString("ollama") 
    ?? "http://localhost:11434";

if (!Uri.TryCreate(ollamaUri, UriKind.Absolute, out var uri))
{
    _logger.LogError("Invalid Ollama URI: {Uri}", ollamaUri);
    uri = new Uri("http://localhost:11434"); // Fallback
}

// Verify connection on startup
builder.Services.AddHostedService<OllamaHealthCheckService>();

Root Cause: Missing Service Discovery

In Aspire, services discover each other via connection strings. Ensure Ollama is registered:

// In AppHost.cs
var ollama = builder.AddOllama("ollama", port: 11434)
    .WithDataVolume()
    .WithLifetime(ContainerLifetime.Persistent);

// Services reference Ollama
var planner = builder.AddProject<Projects.ProjectName_PlannerService>("planner-agent")
    .WithReference(ollama) // ✅ This creates connection string
    .WaitFor(llama);

✅ Fix

Use Aspire connection strings, not hardcoded URLs
Verify URI parsing with Uri.TryCreate
Add health check service to verify Ollama on startup
Use circuit breaker to fail fast when Ollama is down

6. The JSON Parsing Error: Structured Output Fails

❌ Symptom

JsonException: The JSON value could not be converted
Agent returns valid text but JSON parsing fails
Ollama returns JSON but with extra text/markdown

Root Cause: Missing ResponseFormat Configuration

// ❌ BAD: No response format
var chatOptions = new ChatOptions();
var response = await _chatClient.CompleteChatAsync(history, chatOptions);
var plan = JsonSerializer.Deserialize<PlanResponse>(response.Content); // Fails!

// ✅ GOOD: With JSON response format
var chatOptions = new ChatOptions
{
    ResponseFormat = ChatResponseFormat.Json, // ✅ Forces JSON output
    AdditionalProperties = new Dictionary<string, object?>
    {
        ["schema"] = JsonSerializer.Serialize(new
        {
            type = "object",
            properties = new
            {
                plan = new { type = "string" },
                steps = new { type = "array" }
            },
            required = new[] { "plan", "steps" }
        })
    }
};

var response = await _chatClient.CompleteChatAsync(history, chatOptions);
var plan = JsonSerializer.Deserialize<PlanResponse>(response.Content); // Works!

Root Cause: Extra Markdown Wrapping

Even with ResponseFormat.Json, some models wrap JSON in markdown:

// ✅ GOOD: Strip markdown code blocks
private T ParseStructuredOutput<T>(string content)
{
    // Remove markdown code blocks if present
    content = System.Text.RegularExpressions.Regex.Replace(
        content,
        @"```json\s*|\s*```",
        "",
        System.Text.RegularExpressions.RegexOptions.IgnoreCase);

    content = content.Trim();

    try
    {
        return JsonSerializer.Deserialize<T>(content) 
            ?? throw new InvalidOperationException("Failed to deserialize JSON");
    }
    catch (JsonException ex)
    {
        _logger.LogError(ex, "Failed to parse JSON: {Content}", content);
        throw;
    }
}

✅ Fix

Always set ResponseFormat = ChatResponseFormat.Json
Provide JSON schema in AdditionalProperties
Strip markdown code blocks before parsing
Wrap parsing in try-catch with detailed error logs

7. The Memory Leak: Knowledge Vault Grows Unbounded

❌ Symptom

PostgreSQL database grows continuously
Vector search becomes slow
Memory usage increases over time
No cleanup of old cognitive trails

Root Cause: No Retention Policy

// ✅ GOOD: Background service to clean old entries
public class KnowledgeVaultCleanupService : BackgroundService
{
    private readonly KnowledgeDbContext _db;
    private readonly ILogger<KnowledgeVaultCleanupService> _logger;
    private readonly TimeSpan _retentionPeriod = TimeSpan.FromDays(90);

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        while (!stoppingToken.IsCancellationRequested)
        {
            try
            {
                var cutoffDate = DateTime.UtcNow - _retentionPeriod;
                
                var deleted = await _db.KnowledgeEntries
                    .Where(e => e.CreatedAt < cutoffDate)
                    .ExecuteDeleteAsync(stoppingToken);

                if (deleted > 0)
                {
                    _logger.LogInformation("Cleaned up {Count} old knowledge entries", deleted);
                }

                // Run cleanup daily
                await Task.Delay(TimeSpan.FromDays(1), stoppingToken);
            }
            catch (Exception ex)
            {
                _logger.LogError(ex, "Error during knowledge vault cleanup");
                await Task.Delay(TimeSpan.FromHours(1), stoppingToken);
            }
        }
    }
}

✅ Fix

Implement retention policy (e.g., 90 days)
Create background service to clean old entries
Archive important trails before deletion
Monitor database size and query performance

Debugging Tools & Techniques

1. OpenTelemetry Tracing

Use Aspire dashboard to trace agent calls:

View gRPC call latency
Identify slow agents
Track request flow through PMCR-O phases

2. Structured Logging

// ✅ GOOD: Structured logging with context
_logger.LogInformation(
    "🧭 I am the Planner. I am analyzing the intent: {Intent}",
    request.Intent);

_logger.LogError(
    ex,
    "❌ Planner failed for intent: {Intent}. Error: {Error}",
    request.Intent,
    ex.Message);

3. Health Checks

Add health checks for each agent service:

builder.Services.AddHealthChecks()
    .AddCheck("ollama", () =>
    {
        // Verify Ollama is reachable
        using var client = new HttpClient();
        var response = client.GetAsync("http://localhost:11434/api/tags").Result;
        return response.IsSuccessStatusCode
            ? HealthCheckResult.Healthy()
            : HealthCheckResult.Unhealthy("Ollama unreachable");
    })
    .AddCheck("postgres", () =>
    {
        // Verify PostgreSQL connection
        // ...
    });

Prevention Checklist

✅ Before Deploying to Production

✅ Configure HttpClient timeouts and resilience handlers
✅ Add UseFunctionInvocation middleware
✅ Pass cancellation tokens to all async operations
✅ Implement loop detection for Reflector
✅ Add null checks for agent responses
✅ Use Aspire connection strings, not hardcoded URLs
✅ Configure ResponseFormat.Json for structured output
✅ Implement knowledge vault retention policy
✅ Add health checks for all dependencies
✅ Enable OpenTelemetry tracing

2026 AI-Assisted Debugging Tools

Modern debugging tools in 2026 leverage AI to automatically detect and fix PMCR-O loop issues. Here are the cutting-edge tools available:

1. Ollama-Powered Debug Agent

Use an AI agent to analyze your PMCR-O logs and suggest fixes:

// AI-Assisted Debug Agent
public class AIDebugAgent
{
    private readonly IChatClient _chatClient;

    public async Task<DebugRecommendation> AnalyzeLogsAsync(
        string logContent,
        CancellationToken cancellationToken = default)
    {
        var prompt = $@"I AM the PMCR-O Debug Agent.
I ANALYZE logs to identify loop failures.
I SUGGEST specific fixes with code examples.

Analyze these PMCR-O logs and identify issues:

{logContent}

Output JSON with:
- issue_type: [infinite_loop | deadlock | silent_failure | connection_error | json_parsing | memory_leak]
- root_cause: string
- fix_code: string (C# code example)
- prevention: string[]";

        var chatOptions = new ChatOptions
        {
            ResponseFormat = ChatResponseFormat.Json,
            Temperature = 0.1 // Low temperature for precise analysis
        };

        var response = await _chatClient.CompleteChatAsync(
            new ChatHistory { new ChatMessage(ChatRole.User, prompt) },
            chatOptions,
            cancellationToken);

        return JsonSerializer.Deserialize<DebugRecommendation>(response.Content)
            ?? throw new InvalidOperationException("Failed to parse debug recommendation");
    }
}

2. Real-Time Loop Detection with OpenTelemetry

Use OpenTelemetry traces to automatically detect infinite loops:

// Automatic loop detection via OpenTelemetry
public class LoopDetectionMiddleware
{
    private readonly ILogger<LoopDetectionMiddleware> _logger;
    private readonly Dictionary<string, int> _traceCounts = new();

    public void OnTraceStart(string traceId, string operationName)
    {
        var key = $"{traceId}:{operationName}";
        _traceCounts.TryGetValue(key, out var count);
        _traceCounts[key] = count + 1;

        if (count + 1 > 10) // Threshold for loop detection
        {
            _logger.LogWarning(
                "Potential infinite loop detected: {Operation} in trace {TraceId}",
                operationName, traceId);

            // Trigger alert or circuit breaker
            throw new InfiniteLoopException(
                $"Operation {operationName} executed {count + 1} times in trace {traceId}");
        }
    }
}

3. .NET 11 Preview: Enhanced Diagnostics

.NET 11 introduces improved diagnostics for async operations and gRPC:

Async Stack Traces: Full async call stacks showing where operations hang
gRPC Deadlock Detection: Automatic detection of blocking gRPC calls
Memory Profiling: Built-in memory leak detection for knowledge vaults
AI-Powered Suggestions: IDE integration that suggests fixes based on error patterns

// .NET 11: Enhanced async diagnostics
// Enable in appsettings.json:
{
  "Diagnostics": {
    "AsyncStackTrace": true,
    "GrpcDeadlockDetection": true,
    "MemoryProfiling": {
      "Enabled": true,
      "Threshold": "100MB"
    }
  }
}

// Automatic deadlock detection
// .NET 11 will throw GrpcDeadlockException if blocking detected
try
{
    var response = await plannerClient.ExecuteTaskAsync(request);
}
catch (GrpcDeadlockException ex)
{
    // .NET 11 provides detailed stack trace showing where deadlock occurred
    _logger.LogError(ex, "gRPC deadlock detected: {StackTrace}", ex.AsyncStackTrace);
}

4. Visual Studio 2026: AI Debug Assistant

Visual Studio 2026 includes an AI-powered debug assistant that:

Analyzes breakpoints and suggests where to add more
Identifies common PMCR-O patterns (infinite loops, deadlocks)
Generates fix suggestions with code examples
Learns from your debugging patterns to improve suggestions

🔧 2026 Debugging Workflow

Enable AI Debug Agent: Let AI analyze logs automatically
Use OpenTelemetry: Real-time loop detection in production
Leverage .NET 11 Diagnostics: Built-in async/gRPC debugging
IDE Integration: Visual Studio 2026 AI assistant for interactive debugging

Conclusion

PMCR-O loops fail in predictable patterns. Most issues stem from:

Missing timeouts → Infinite loops
Blocking async calls → Deadlocks
Missing null checks → Silent failures
No loop detection → Infinite reflection
Hardcoded URLs → Connection errors
Missing JSON format → Parsing errors
No retention policy → Memory leaks

Follow the patterns in this guide, and your PMCR-O system will be production-ready.

🔗 Related Resources:

PMCR-O Quickstart - Build the foundation
Building Self-Referential Agents Part 2 - Multi-agent patterns
PMCR-O Codex - Framework architecture
PMCR-O and pgvector - Knowledge vault setup

Debugging PMCR-O Loops: Common Pitfalls and Fixes

1. The Infinite Loop: Planner Never Completes

❌ Symptom

Root Causes

✅ Fix

2. The Deadlock: Agents Waiting for Each Other

❌ Symptom

Root Cause: Synchronous gRPC Calls in Async Context

Root Cause: Missing Cancellation Tokens

✅ Fix

3. The Silent Failure: Agent Returns Empty Response

❌ Symptom

Root Cause: Missing Null Checks

✅ Fix

4. The Strange Loop: Infinite Reflection

❌ Symptom

Root Cause: Missing Loop Detection

✅ Fix

5. The Connection Error: Ollama Unreachable

❌ Symptom

Root Cause: Incorrect Connection String

Root Cause: Missing Service Discovery

✅ Fix

6. The JSON Parsing Error: Structured Output Fails

❌ Symptom

Root Cause: Missing ResponseFormat Configuration

Root Cause: Extra Markdown Wrapping

✅ Fix

7. The Memory Leak: Knowledge Vault Grows Unbounded

❌ Symptom

Root Cause: No Retention Policy

✅ Fix

Debugging Tools & Techniques

1. OpenTelemetry Tracing

2. Structured Logging

3. Health Checks

Prevention Checklist

✅ Before Deploying to Production

2026 AI-Assisted Debugging Tools

1. Ollama-Powered Debug Agent

2. Real-Time Loop Detection with OpenTelemetry

3. .NET 11 Preview: Enhanced Diagnostics

4. Visual Studio 2026: AI Debug Assistant

🔧 2026 Debugging Workflow

Conclusion

About Shawn Delaine Bellazan