โ† Back to Portfolio

Debugging PMCR-O Loops: Common Pitfalls and Fixes

PMCR-O agents operate in strange loopsโ€”self-referential cycles that can spiral into infinite recursion, deadlock, or silent failures. This guide covers the most common debugging scenarios you'll encounter in production.

๐ŸŽฏ Debugging Philosophy: PMCR-O loops fail in predictable patterns. Once you recognize the pattern, the fix is usually straightforward. This guide maps symptoms to solutions.

1. The Infinite Loop: Planner Never Completes

โŒ Symptom

  • Planner agent runs indefinitely
  • No response from gRPC service
  • OpenTelemetry shows continuous activity
  • Ollama logs show repeated requests

Root Causes

A. Missing Timeout Configuration

Ollama requests can hang indefinitely if timeouts aren't configured. Check your HttpClient setup:

C#
// โŒ BAD: No timeout
builder.Services.AddHttpClient("ollama", client =>
{
    client.BaseAddress = uri;
    // Missing: client.Timeout = ...
});

// โœ… GOOD: Infinite timeout, let resilience handler control it
builder.Services.AddHttpClient("ollama", client =>
{
    client.BaseAddress = uri;
    client.Timeout = Timeout.InfiniteTimeSpan; // Let resilience handler manage
})
.AddStandardResilienceHandler(options =>
{
    options.AttemptTimeout.Timeout = TimeSpan.FromMinutes(3);
    options.TotalRequestTimeout.Timeout = TimeSpan.FromMinutes(5);
    options.Retry.MaxRetryAttempts = 2;
});

B. Missing UseFunctionInvocation Middleware

If your IChatClient isn't configured with function invocation, tool calls will hang:

C#
// โŒ BAD: Missing UseFunctionInvocation
builder.Services.AddSingleton<IChatClient>(sp =>
{
    var httpClient = sp.GetRequiredService<IHttpClientFactory>().CreateClient("ollama");
    var baseClient = new OllamaApiClient(httpClient, modelId);
    return baseClient; // Missing middleware!
});

// โœ… GOOD: With function invocation
builder.Services.AddSingleton<IChatClient>(sp =>
{
    var httpClient = sp.GetRequiredService<IHttpClientFactory>().CreateClient("ollama");
    var baseClient = new OllamaApiClient(httpClient, modelId);
    
    return new ChatClientBuilder(baseClient)
        .UseFunctionInvocation() // โœ… Critical middleware
        .Build();
});

โœ… Fix

  1. Add timeout configuration to HttpClient
  2. Ensure UseFunctionInvocation middleware is registered
  3. Add circuit breaker to prevent cascading failures
  4. Monitor OpenTelemetry traces for stuck requests

2. The Deadlock: Agents Waiting for Each Other

โŒ Symptom

  • Orchestrator calls Planner, then hangs
  • Planner waits for Maker, Maker waits for Checker
  • gRPC calls timeout after 5 minutes
  • No error logs, just silence

Root Cause: Synchronous gRPC Calls in Async Context

If you're calling gRPC services synchronously or with blocking waits, you'll deadlock:

C#
// โŒ BAD: Blocking wait
var plannerResponse = plannerClient.ExecuteTask(request).ResponseAsync.Result; // Deadlock!

// โœ… GOOD: Fully async
var plannerResponse = await plannerClient.ExecuteTaskAsync(request, cancellationToken);

Root Cause: Missing Cancellation Tokens

Without cancellation tokens, long-running operations can't be aborted:

C#
// โŒ BAD: No cancellation token
public override async Task<AgentResponse> ExecuteTask(
    AgentRequest request,
    ServerCallContext context)
{
    var response = await _chatClient.CompleteChatAsync(history); // Can't cancel!
}

// โœ… GOOD: With cancellation
public override async Task<AgentResponse> ExecuteTask(
    AgentRequest request,
    ServerCallContext context)
{
    var cancellationToken = context.CancellationToken;
    var response = await _chatClient.CompleteChatAsync(history, cancellationToken);
}

โœ… Fix

  1. Always use await, never .Result or .Wait()
  2. Pass context.CancellationToken to all async operations
  3. Configure gRPC deadline: context.Deadline
  4. Use Task.WhenAll for parallel agent calls, not sequential

3. The Silent Failure: Agent Returns Empty Response

โŒ Symptom

  • Agent returns Success = true but Content = ""
  • No exceptions logged
  • Ollama returns valid response, but agent doesn't process it

Root Cause: Missing Null Checks

C#
// โŒ BAD: No null check
var response = await agent.RunAsync(message, thread);
var planContent = response.Text; // Can be null!

return new AgentResponse
{
    Content = planContent, // Empty string if null
    Success = true
};

// โœ… GOOD: Defensive null handling
var response = await agent.RunAsync(message, thread);
var planContent = response.Text ?? "[No plan generated]";

if (string.IsNullOrWhiteSpace(planContent))
{
    _logger.LogWarning("Agent returned empty response for intent: {Intent}", request.Intent);
    return new AgentResponse
    {
        Content = "Agent failed to generate response. Check logs for details.",
        Success = false
    };
}

return new AgentResponse
{
    Content = planContent,
    Success = true
};

โœ… Fix

  1. Always check response.Text for null/empty
  2. Log warnings when responses are empty
  3. Return Success = false for empty responses
  4. Add validation in Orchestrator before passing to next phase

4. The Strange Loop: Infinite Reflection

โŒ Symptom

  • Reflector keeps reflecting on its own reflections
  • Knowledge vault fills with duplicate entries
  • Agent performance degrades over time
  • No exit condition from reflection loop

Root Cause: Missing Loop Detection

C#
// โŒ BAD: No loop detection
public override async Task<AgentResponse> ExecuteTask(
    AgentRequest request,
    ServerCallContext context)
{
    var reflection = await Reflect(request.Intent);
    var deeperReflection = await Reflect(reflection);
    var evenDeeper = await Reflect(deeperReflection); // Infinite!
}

// โœ… GOOD: With loop detection
private readonly HashSet<string> _processedIntents = new();
private const int MaxReflectionDepth = 3;

public override async Task<AgentResponse> ExecuteTask(
    AgentRequest request,
    ServerCallContext context)
{
    var intentHash = HashIntent(request.Intent);
    
    if (_processedIntents.Contains(intentHash))
    {
        _logger.LogWarning("Detected reflection loop for intent: {Intent}", request.Intent);
        return new AgentResponse
        {
            Content = "Reflection loop detected. Stopping to prevent infinite recursion.",
            Success = false
        };
    }

    _processedIntents.Add(intentHash);
    
    try
    {
        var reflection = await ReflectWithDepth(request.Intent, MaxReflectionDepth);
        return new AgentResponse { Content = reflection, Success = true };
    }
    finally
    {
        _processedIntents.Remove(intentHash); // Clean up after processing
    }
}

private string HashIntent(string intent) => 
    System.Security.Cryptography.SHA256.HashData(
        System.Text.Encoding.UTF8.GetBytes(intent))
    .Select(b => b.ToString("x2"))
    .Aggregate((a, b) => a + b);

โœ… Fix

  1. Track processed intents with hash set
  2. Set maximum reflection depth (e.g., 3 levels)
  3. Log warnings when loops are detected
  4. Clean up tracking after processing completes

5. The Connection Error: Ollama Unreachable

โŒ Symptom

  • HttpRequestException: Connection refused
  • All agents fail with same error
  • Ollama service is running but agents can't connect

Root Cause: Incorrect Connection String

C#
// โŒ BAD: Hardcoded localhost
var ollamaUri = "http://localhost:11434";

// โœ… GOOD: From Aspire connection string
var ollamaUri = builder.Configuration.GetConnectionString("ollama") 
    ?? "http://localhost:11434";

if (!Uri.TryCreate(ollamaUri, UriKind.Absolute, out var uri))
{
    _logger.LogError("Invalid Ollama URI: {Uri}", ollamaUri);
    uri = new Uri("http://localhost:11434"); // Fallback
}

// Verify connection on startup
builder.Services.AddHostedService<OllamaHealthCheckService>();

Root Cause: Missing Service Discovery

In Aspire, services discover each other via connection strings. Ensure Ollama is registered:

C#
// In AppHost.cs
var ollama = builder.AddOllama("ollama", port: 11434)
    .WithDataVolume()
    .WithLifetime(ContainerLifetime.Persistent);

// Services reference Ollama
var planner = builder.AddProject<Projects.ProjectName_PlannerService>("planner-agent")
    .WithReference(ollama) // โœ… This creates connection string
    .WaitFor(llama);

โœ… Fix

  1. Use Aspire connection strings, not hardcoded URLs
  2. Verify URI parsing with Uri.TryCreate
  3. Add health check service to verify Ollama on startup
  4. Use circuit breaker to fail fast when Ollama is down

6. The JSON Parsing Error: Structured Output Fails

โŒ Symptom

  • JsonException: The JSON value could not be converted
  • Agent returns valid text but JSON parsing fails
  • Ollama returns JSON but with extra text/markdown

Root Cause: Missing ResponseFormat Configuration

C#
// โŒ BAD: No response format
var chatOptions = new ChatOptions();
var response = await _chatClient.CompleteChatAsync(history, chatOptions);
var plan = JsonSerializer.Deserialize<PlanResponse>(response.Content); // Fails!

// โœ… GOOD: With JSON response format
var chatOptions = new ChatOptions
{
    ResponseFormat = ChatResponseFormat.Json, // โœ… Forces JSON output
    AdditionalProperties = new Dictionary<string, object?>
    {
        ["schema"] = JsonSerializer.Serialize(new
        {
            type = "object",
            properties = new
            {
                plan = new { type = "string" },
                steps = new { type = "array" }
            },
            required = new[] { "plan", "steps" }
        })
    }
};

var response = await _chatClient.CompleteChatAsync(history, chatOptions);
var plan = JsonSerializer.Deserialize<PlanResponse>(response.Content); // Works!

Root Cause: Extra Markdown Wrapping

Even with ResponseFormat.Json, some models wrap JSON in markdown:

C#
// โœ… GOOD: Strip markdown code blocks
private T ParseStructuredOutput<T>(string content)
{
    // Remove markdown code blocks if present
    content = System.Text.RegularExpressions.Regex.Replace(
        content,
        @"```json\s*|\s*```",
        "",
        System.Text.RegularExpressions.RegexOptions.IgnoreCase);

    content = content.Trim();

    try
    {
        return JsonSerializer.Deserialize<T>(content) 
            ?? throw new InvalidOperationException("Failed to deserialize JSON");
    }
    catch (JsonException ex)
    {
        _logger.LogError(ex, "Failed to parse JSON: {Content}", content);
        throw;
    }
}

โœ… Fix

  1. Always set ResponseFormat = ChatResponseFormat.Json
  2. Provide JSON schema in AdditionalProperties
  3. Strip markdown code blocks before parsing
  4. Wrap parsing in try-catch with detailed error logs

7. The Memory Leak: Knowledge Vault Grows Unbounded

โŒ Symptom

  • PostgreSQL database grows continuously
  • Vector search becomes slow
  • Memory usage increases over time
  • No cleanup of old cognitive trails

Root Cause: No Retention Policy

C#
// โœ… GOOD: Background service to clean old entries
public class KnowledgeVaultCleanupService : BackgroundService
{
    private readonly KnowledgeDbContext _db;
    private readonly ILogger<KnowledgeVaultCleanupService> _logger;
    private readonly TimeSpan _retentionPeriod = TimeSpan.FromDays(90);

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        while (!stoppingToken.IsCancellationRequested)
        {
            try
            {
                var cutoffDate = DateTime.UtcNow - _retentionPeriod;
                
                var deleted = await _db.KnowledgeEntries
                    .Where(e => e.CreatedAt < cutoffDate)
                    .ExecuteDeleteAsync(stoppingToken);

                if (deleted > 0)
                {
                    _logger.LogInformation("Cleaned up {Count} old knowledge entries", deleted);
                }

                // Run cleanup daily
                await Task.Delay(TimeSpan.FromDays(1), stoppingToken);
            }
            catch (Exception ex)
            {
                _logger.LogError(ex, "Error during knowledge vault cleanup");
                await Task.Delay(TimeSpan.FromHours(1), stoppingToken);
            }
        }
    }
}

โœ… Fix

  1. Implement retention policy (e.g., 90 days)
  2. Create background service to clean old entries
  3. Archive important trails before deletion
  4. Monitor database size and query performance

Debugging Tools & Techniques

1. OpenTelemetry Tracing

Use Aspire dashboard to trace agent calls:

  • View gRPC call latency
  • Identify slow agents
  • Track request flow through PMCR-O phases

2. Structured Logging

C#
// โœ… GOOD: Structured logging with context
_logger.LogInformation(
    "๐Ÿงญ I am the Planner. I am analyzing the intent: {Intent}",
    request.Intent);

_logger.LogError(
    ex,
    "โŒ Planner failed for intent: {Intent}. Error: {Error}",
    request.Intent,
    ex.Message);

3. Health Checks

Add health checks for each agent service:

C#
builder.Services.AddHealthChecks()
    .AddCheck("ollama", () =>
    {
        // Verify Ollama is reachable
        using var client = new HttpClient();
        var response = client.GetAsync("http://localhost:11434/api/tags").Result;
        return response.IsSuccessStatusCode
            ? HealthCheckResult.Healthy()
            : HealthCheckResult.Unhealthy("Ollama unreachable");
    })
    .AddCheck("postgres", () =>
    {
        // Verify PostgreSQL connection
        // ...
    });

Prevention Checklist

โœ… Before Deploying to Production

  • โœ… Configure HttpClient timeouts and resilience handlers
  • โœ… Add UseFunctionInvocation middleware
  • โœ… Pass cancellation tokens to all async operations
  • โœ… Implement loop detection for Reflector
  • โœ… Add null checks for agent responses
  • โœ… Use Aspire connection strings, not hardcoded URLs
  • โœ… Configure ResponseFormat.Json for structured output
  • โœ… Implement knowledge vault retention policy
  • โœ… Add health checks for all dependencies
  • โœ… Enable OpenTelemetry tracing

2026 AI-Assisted Debugging Tools

Modern debugging tools in 2026 leverage AI to automatically detect and fix PMCR-O loop issues. Here are the cutting-edge tools available:

1. Ollama-Powered Debug Agent

Use an AI agent to analyze your PMCR-O logs and suggest fixes:

C#
// AI-Assisted Debug Agent
public class AIDebugAgent
{
    private readonly IChatClient _chatClient;

    public async Task<DebugRecommendation> AnalyzeLogsAsync(
        string logContent,
        CancellationToken cancellationToken = default)
    {
        var prompt = $@"I AM the PMCR-O Debug Agent.
I ANALYZE logs to identify loop failures.
I SUGGEST specific fixes with code examples.

Analyze these PMCR-O logs and identify issues:

{logContent}

Output JSON with:
- issue_type: [infinite_loop | deadlock | silent_failure | connection_error | json_parsing | memory_leak]
- root_cause: string
- fix_code: string (C# code example)
- prevention: string[]";

        var chatOptions = new ChatOptions
        {
            ResponseFormat = ChatResponseFormat.Json,
            Temperature = 0.1 // Low temperature for precise analysis
        };

        var response = await _chatClient.CompleteChatAsync(
            new ChatHistory { new ChatMessage(ChatRole.User, prompt) },
            chatOptions,
            cancellationToken);

        return JsonSerializer.Deserialize<DebugRecommendation>(response.Content)
            ?? throw new InvalidOperationException("Failed to parse debug recommendation");
    }
}

2. Real-Time Loop Detection with OpenTelemetry

Use OpenTelemetry traces to automatically detect infinite loops:

C#
// Automatic loop detection via OpenTelemetry
public class LoopDetectionMiddleware
{
    private readonly ILogger<LoopDetectionMiddleware> _logger;
    private readonly Dictionary<string, int> _traceCounts = new();

    public void OnTraceStart(string traceId, string operationName)
    {
        var key = $"{traceId}:{operationName}";
        _traceCounts.TryGetValue(key, out var count);
        _traceCounts[key] = count + 1;

        if (count + 1 > 10) // Threshold for loop detection
        {
            _logger.LogWarning(
                "Potential infinite loop detected: {Operation} in trace {TraceId}",
                operationName, traceId);

            // Trigger alert or circuit breaker
            throw new InfiniteLoopException(
                $"Operation {operationName} executed {count + 1} times in trace {traceId}");
        }
    }
}

3. .NET 11 Preview: Enhanced Diagnostics

.NET 11 introduces improved diagnostics for async operations and gRPC:

  • Async Stack Traces: Full async call stacks showing where operations hang
  • gRPC Deadlock Detection: Automatic detection of blocking gRPC calls
  • Memory Profiling: Built-in memory leak detection for knowledge vaults
  • AI-Powered Suggestions: IDE integration that suggests fixes based on error patterns
C#
// .NET 11: Enhanced async diagnostics
// Enable in appsettings.json:
{
  "Diagnostics": {
    "AsyncStackTrace": true,
    "GrpcDeadlockDetection": true,
    "MemoryProfiling": {
      "Enabled": true,
      "Threshold": "100MB"
    }
  }
}

// Automatic deadlock detection
// .NET 11 will throw GrpcDeadlockException if blocking detected
try
{
    var response = await plannerClient.ExecuteTaskAsync(request);
}
catch (GrpcDeadlockException ex)
{
    // .NET 11 provides detailed stack trace showing where deadlock occurred
    _logger.LogError(ex, "gRPC deadlock detected: {StackTrace}", ex.AsyncStackTrace);
}

4. Visual Studio 2026: AI Debug Assistant

Visual Studio 2026 includes an AI-powered debug assistant that:

  • Analyzes breakpoints and suggests where to add more
  • Identifies common PMCR-O patterns (infinite loops, deadlocks)
  • Generates fix suggestions with code examples
  • Learns from your debugging patterns to improve suggestions

๐Ÿ”ง 2026 Debugging Workflow

  1. Enable AI Debug Agent: Let AI analyze logs automatically
  2. Use OpenTelemetry: Real-time loop detection in production
  3. Leverage .NET 11 Diagnostics: Built-in async/gRPC debugging
  4. IDE Integration: Visual Studio 2026 AI assistant for interactive debugging

Conclusion

PMCR-O loops fail in predictable patterns. Most issues stem from:

  1. Missing timeouts โ†’ Infinite loops
  2. Blocking async calls โ†’ Deadlocks
  3. Missing null checks โ†’ Silent failures
  4. No loop detection โ†’ Infinite reflection
  5. Hardcoded URLs โ†’ Connection errors
  6. Missing JSON format โ†’ Parsing errors
  7. No retention policy โ†’ Memory leaks

Follow the patterns in this guide, and your PMCR-O system will be production-ready.

๐Ÿ”— Related Resources:

Shawn Delaine Bellazan

About Shawn Delaine Bellazan

Resilient Architect & PMCR-O Framework Creator

Shawn is the creator of the PMCR-O framework, a self-referential AI architecture that embodies the strange loop it describes. With 15+ years in enterprise software development, Shawn specializes in building resilient systems at the intersection of philosophy and technology. His work focuses on autonomous AI agents that evolve through vulnerability and expression.