QueryMT Agent - Middleware System

Middleware in QueryMT Agent provides a pluggable way to extend and modify agent behavior. The middleware system processes requests, responses, tool calls, and tool results through a configurable chain of handlers.

Architecture

The middleware system uses a driver-based architecture where each middleware implements the MiddlewareDriver trait. Multiple drivers are chained together in a CompositeDriver that processes each agent interaction.

flowchart LR
    A([User Request])
    subgraph CD["CompositeDriver (Middleware Chain)"]
        M1["Limits<br/>Middleware"] --> M2["Context<br/>Middleware"] --> M3["AgentMode<br/>Middleware"]
    end
    B([Agent Processing])

    A --> CD --> B

Middleware Driver Trait

All middleware must implement the MiddlewareDriver trait:

pub trait MiddlewareDriver: Send + Sync {
    /// Unique name for this middleware
    fn name(&self) -> &str;

    /// Process incoming request (before agent processes)
    fn process_request(&self, request: &mut Request) -> Result<()>;

    /// Process outgoing response (after agent processes)
    fn process_response(&self, response: &mut Response) -> Result<()>;

    /// Process tool call (before tool execution)
    fn process_tool_call(&self, tool_call: &mut ToolCall) -> Result<()>;

    /// Process tool result (after tool execution)
    fn process_tool_result(&self, result: &mut ToolResult) -> Result<()>;
}

Built-in Middleware

LimitsMiddleware

Enforces execution limits on agent operations.

Configuration:

[[middleware]]
type = "limits"
max_steps = 200
max_turns = 50

Features: - max_steps: Maximum tool calls per session - max_turns: Maximum conversation turns - price_limit: Maximum cost (if provider supports)

Behavior: - Tracks step and turn counts - Rejects requests when limits exceeded - Provides clear error messages

ContextMiddleware

Manages conversation context and token usage.

Configuration:

[[middleware]]
type = "context"
warn_at_percent = 80
compact_at_percent = 90
fallback_max_tokens = 128000

Features: - warn_at_percent: Trigger warning at this % of context limit - compact_at_percent: Trigger compaction at this % - fallback_max_tokens: Fallback context limit

Behavior: - Monitors token usage - Triggers compaction when approaching limits - Emits warnings for monitoring

AgentModeMiddleware

Enforces mode-specific restrictions (build/plan/review).

Configuration:

[[middleware]]
type = "agent_mode"
default = "build"
reminder = """<system-reminder>
You are in plan mode. Read-only access.
</system-reminder>"""
review_reminder = """<system-reminder>
You are in review mode. Provide feedback only.
</system-reminder>"""

Features: - default: Default mode on session start - reminder: System message for plan mode - review_reminder: System message for review mode

Behavior: - Injects mode-specific reminders - Restricts tool access based on mode - Allows runtime mode switching

DedupCheckMiddleware

Detects duplicate or similar code patterns.

Configuration:

[[middleware]]
type = "dedup_check"
threshold = 0.85
min_lines = 10

Features: - threshold: Similarity threshold (0.0 - 1.0) - min_lines: Minimum lines to consider

Behavior: - Analyzes code before writing - Warns about similar existing code - Helps avoid code duplication

ContextFactory Middleware

Legacy context management middleware.

Configuration:

[[middleware]]
type = "context"
max_tokens = 128000
compact_on_overflow = true

Creating Custom Middleware

Basic Middleware

use querymt_agent::middleware::{MiddlewareDriver, MiddlewareError, Result};
use querymt_agent::middleware::state::{ExecutionState, ToolCall, ToolResult};
use std::sync::Arc;

pub struct LoggingMiddleware {
    name: String,
}

impl LoggingMiddleware {
    pub fn new() -> Self {
        Self {
            name: "logging".to_string(),
        }
    }
}

impl MiddlewareDriver for LoggingMiddleware {
    fn name(&self) -> &str {
        &self.name
    }

    fn process_request(&self, request: &mut Request) -> Result<()> {
        log::info!("Request: {:?}", request);
        Ok(())
    }

    fn process_response(&self, response: &mut Response) -> Result<()> {
        log::info!("Response: {:?}", response);
        Ok(())
    }

    fn process_tool_call(&self, tool_call: &mut ToolCall) -> Result<()> {
        log::info!("Tool call: {}({})", 
            tool_call.function.name, 
            tool_call.function.arguments);
        Ok(())
    }

    fn process_tool_result(&self, result: &mut ToolResult) -> Result<()> {
        log::info!("Tool result: {} -> {} bytes", 
            result.tool_name, 
            result.result.len());
        Ok(())
    }
}

Middleware with State

use std::sync::{Arc, Mutex};

pub struct RateLimitMiddleware {
    name: String,
    requests: Arc<Mutex<Vec<u64>>>,
    max_per_second: usize,
}

impl RateLimitMiddleware {
    pub fn new(max_per_second: usize) -> Self {
        Self {
            name: "rate_limit".to_string(),
            requests: Arc::new(Mutex::new(Vec::new())),
            max_per_second,
        }
    }

    fn cleanup_old_requests(&self) {
        let now = std::time::SystemTime::now()
            .duration_since(std::time::UNIX_EPOCH)
            .unwrap()
            .as_secs();

        let mut requests = self.requests.lock().unwrap();
        requests.retain(|&t| now - t < 1);
    }
}

impl MiddlewareDriver for RateLimitMiddleware {
    fn name(&self) -> &str {
        &self.name
    }

    fn process_request(&self, request: &mut Request) -> Result<()> {
        self.cleanup_old_requests();

        let mut requests = self.requests.lock().unwrap();
        if requests.len() >= self.max_per_second {
            return Err(MiddlewareError::rate_limit_exceeded(
                self.name().to_string()
            ));
        }
        requests.push(std::time::SystemTime::now()
            .duration_since(std::time::UNIX_EPOCH)
            .unwrap()
            .as_secs());

        Ok(())
    }

    // Other methods...
}

Middleware with Configuration

use serde::{Deserialize, Serialize};
use serde_json::Value;

#[derive(Debug, Clone, Deserialize)]
pub struct MyMiddlewareConfig {
    pub enabled: bool,
    pub option1: String,
    pub option2: Option<usize>,
}

pub struct MyMiddleware {
    config: MyMiddlewareConfig,
}

impl MyMiddleware {
    pub fn new(config: MyMiddlewareConfig) -> Self {
        Self { config }
    }
}

impl MiddlewareDriver for MyMiddleware {
    fn name(&self) -> &str {
        "my_middleware"
    }

    fn process_request(&self, request: &mut Request) -> Result<()> {
        if !self.config.enabled {
            return Ok(());
        }

        // Custom logic here
        Ok(())
    }

    // Other methods...
}

Registering Middleware

Via Config File

[[middleware]]
type = "my_middleware"
enabled = true
option1 = "value1"
option2 = 42

Programmatically

use querymt_agent::prelude::*;
use querymt_agent::middleware::MiddlewareDriver;

let agent = Agent::single()
    .provider("anthropic", "claude-sonnet-4-5-20250929")
    .cwd(".")
    .tools(["read_tool", "shell"])
    .middleware(LoggingMiddleware::new())
    .middleware(RateLimitMiddleware::new(10))
    .build()
    .await?;

Middleware Factory Pattern

For config-based middleware creation, implement the MiddlewareFactory trait:

use querymt_agent::middleware::{MiddlewareFactory, MiddlewareDriver, Result};
use querymt_agent::agent::agent_config::AgentConfig;
use serde_json::Value;
use std::sync::Arc;

pub struct MyMiddlewareFactory;

impl MiddlewareFactory for MyMiddlewareFactory {
    fn type_name(&self) -> &'static str {
        "my_middleware"
    }

    fn create(
        &self,
        config: &Value,
        _agent_config: &AgentConfig,
    ) -> Result<Arc<dyn MiddlewareDriver>> {
        let config: MyMiddlewareConfig = serde_json::from_value(config.clone())
            .map_err(|e| anyhow::anyhow!("Invalid config: {}", e))?;

        Ok(Arc::new(MyMiddleware::new(config)))
    }
}

// Register the factory
use querymt_agent::middleware::MIDDLEWARE_REGISTRY;
MIDDLEWARE_REGISTRY.register(Arc::new(MyMiddlewareFactory));

Middleware Execution Order

Middleware is executed in the order it was added to the chain:

flowchart TD
    A([Request]) --> M1
    M1["Middleware 1 (first added)"] --> M2
    M2[Middleware 2] --> M3
    M3["Middleware 3 (last added)"] --> AP
    AP([Agent Processing]) --> M3R
    M3R["Middleware 3 (response)"] --> M2R
    M2R["Middleware 2 (response)"] --> M1R
    M1R["Middleware 1 (response)"] --> Z([Done])

Common Patterns

Request Validation

pub struct PermissionMiddleware {
    // ...
}

impl MiddlewareDriver for PermissionMiddleware {
    fn process_tool_call(&self, tool_call: &mut ToolCall) -> Result<()> {
        if self.requires_permission(&tool_call.function.name) {
            // Request permission from user
            let granted = self.request_permission()?;
            if !granted {
                return Err(MiddlewareError::permission_denied(
                    tool_call.function.name.clone()
                ));
            }
        }
        Ok(())
    }
}

Response Modification

pub struct SanitizationMiddleware {
    // ...
}

impl MiddlewareDriver for SanitizationMiddleware {
    fn process_response(&self, response: &mut Response) -> Result<()> {
        // Remove sensitive information
        response.content = self.sanitize(&response.content);
        Ok(())
    }
}

Tool Call Interception

pub struct ToolLoggingMiddleware {
    // ...
}

impl MiddlewareDriver for ToolLoggingMiddleware {
    fn process_tool_call(&self, tool_call: &mut ToolCall) -> Result<()> {
        // Log before execution
        self.log_tool_call(tool_call);

        // Modify arguments if needed
        if tool_call.function.name == "shell" {
            // Add safety prefix
            tool_call.function.arguments = self.sanitize_shell_args(
                &tool_call.function.arguments
            );
        }

        Ok(())
    }
}

Error Handling

Middleware errors are categorized:

pub enum MiddlewareError {
    InternalError(String),
    ValidationError(String),
    PermissionDenied(String),
    RateLimitExceeded(String),
    ConfigError(String),
}

Errors can: - Block execution: Return error to stop processing - Modify behavior: Return Ok() with modified data - Log only: Log and continue

Best Practices

Keep middleware focused: Each middleware should do one thing well
Be idempotent: Middleware should produce same result on repeated runs
Handle errors gracefully: Don't crash the agent on middleware errors
Log appropriately: Use logging for debugging, not for every call
Consider performance: Avoid expensive operations in hot paths
Document behavior: Clearly document what each middleware does

Troubleshooting

Middleware Not Running

Check middleware type is registered in MIDDLEWARE_REGISTRY
Verify config has correct type field
Check for config parsing errors in logs

Middleware Order Issues

Middleware executes in registration order
Use CompositeDriver to control order explicitly
Consider using middleware presets for common patterns

Performance Issues

Profile middleware execution time
Avoid blocking operations in middleware
Use async where possible
Cache expensive computations

Examples

See examples/ for middleware usage:

qmtcode.rs - Uses agent_mode, limits, context middleware
Custom middleware examples in tests

Configuration Guide - Configuring middleware
Agent Modes - Mode-specific behavior
API Reference - Middleware types

QueryMT Agent - Middleware System

Architecture

Middleware Driver Trait

Built-in Middleware

LimitsMiddleware

ContextMiddleware

AgentModeMiddleware

DedupCheckMiddleware

ContextFactory Middleware

Creating Custom Middleware

Basic Middleware

Middleware with State

Middleware with Configuration

Registering Middleware

Via Config File

Programmatically

Middleware Factory Pattern

Middleware Execution Order

Common Patterns

Request Validation

Response Modification

Tool Call Interception

Error Handling

Best Practices

Troubleshooting

Middleware Not Running

Middleware Order Issues

Performance Issues

Examples

Related Documentation