Configuration¶
ScanConfig¶
The ScanConfig object controls what gets tested and how. Use factory methods for common profiles or build a custom config.
Quick Scan (Default)¶
Fast smoke test — selects a small subset of attacks:
config = ScanConfig.quick(
agent_capabilities=capabilities,
vuln_classes=[VulnClass.V1_INDIRECT_INJECTION, VulnClass.V6_SECRET_EXPOSURE],
)
Release Gate¶
Thorough scan suitable for CI/CD pipelines:
Deep Red Team¶
Comprehensive assessment with all attack classes and multiple trials:
Profile Comparison¶
| Setting | Quick | Release Gate | Deep Red Team |
|---|---|---|---|
| Max attacks | 15 | 40 | Unlimited |
| Trials per attack | 1 | 2 | 3 |
| Stealth levels | All | All | All |
| Timeout | 5 min | 15 min | 60 min |
Agent Capabilities¶
Declare what your agent can do so the planner selects relevant attacks:
from agent_redteam.core.models import AgentCapabilities, ToolCapability
from agent_redteam.core.enums import Severity
capabilities = AgentCapabilities(
tools=[
ToolCapability(name="file_read"),
ToolCapability(name="shell"),
ToolCapability(name="http_request"),
ToolCapability(name="send_email"),
],
has_internet_access=True,
has_memory=False,
data_sensitivity=Severity.HIGH,
)
Capability-Based Attack Selection¶
| Capability | Enables Classes |
|---|---|
| Any tools | V3 (Excessive Agency), V5 (Tool Misuse) |
Tool capabilities named mcp, mcp_tool, or mcp_server |
V12 (Supply Chain) |
has_internet_access |
V7 (Data Exfiltration) |
has_memory |
V8 (Memory Poisoning) |
| Always enabled | V1, V2, V4, V6 |
Blast Radius¶
Capabilities also determine the blast radius factor (1.0x--3.0x) which adjusts the final score. An agent with more powerful capabilities gets penalized more heavily for the same vulnerability, because the potential damage is greater.
| Factor | Capabilities |
|---|---|
| 1.0x | Read-only tools, no internet |
| 1.5x | File write or shell access |
| 2.0x | Internet access + shell |
| 2.5x--3.0x | Internet + email + database + shell |
Budget Configuration¶
Control resource consumption:
from agent_redteam.core.models import BudgetConfig
budget = BudgetConfig(
max_attacks=20, # Maximum number of attacks to run
max_api_calls=200, # Maximum LLM API calls
max_cost_usd=5.0, # Maximum estimated cost
max_duration_seconds=600, # Maximum scan duration
trials_per_attack=2, # Repeat each attack N times
)
Trials for confidence
Running multiple trials per attack (2--3) significantly narrows the confidence interval on scores. A single trial gives a wide CI; 3 trials gives a much tighter bound.
Vulnerability Class Filtering¶
Test specific classes only:
config = ScanConfig.quick(
vuln_classes=[
VulnClass.V1_INDIRECT_INJECTION,
VulnClass.V2_DIRECT_INJECTION,
VulnClass.V3_EXCESSIVE_AGENCY,
VulnClass.V4_CONFUSED_DEPUTY,
VulnClass.V5_TOOL_MISUSE,
VulnClass.V6_SECRET_EXPOSURE,
VulnClass.V7_DATA_EXFILTRATION,
VulnClass.V8_MEMORY_POISONING,
VulnClass.V12_SUPPLY_CHAIN,
],
)
Omit the vuln_classes parameter to test all classes relevant to your agent's capabilities.
JudgeConfig (LLM-as-judge)¶
JudgeConfig configures the optional SemanticJudgeDetector. It is not a field on ScanConfig; pass it to Scanner(adapter, config, judge_config=JudgeConfig(...)).
| Field | Type | Description |
|---|---|---|
base_url |
str |
OpenAI-compatible API base URL (default: https://api.openai.com/v1) |
api_key |
str |
API key for the judge model |
model |
str |
Model name (default: gpt-4o-mini) |
temperature |
float |
Sampling temperature (default: 0.0) |
max_tokens |
int |
Max tokens for judge completion (default: 1024) |
evaluation_criteria |
list[str] |
Rubric dimensions (defaults include compliance, output safety, reasoning integrity, scope adherence) |
When judge_config is omitted, only the nine built-in signal detectors run.
Environment Definitions¶
The framework includes pre-built environment definitions:
| Environment | Description | Use Case |
|---|---|---|
swe_agent |
Software engineering agent with shell, git, file tools | Testing coding assistants |
customer_support_agent |
CRM, email, knowledge base tools | Testing support bots |
data_analyst_agent |
SQL, file I/O, HTTP, shell tools | Testing data agents |
Automatic Environment Selection¶
The Scanner automatically selects the best environment profile based on the agent's declared tools via select_environment_profile(agent_capabilities). For example, if your agent declares send_email and lookup_customer tools, the framework selects customer_support_agent; if it declares sql_query or db_query, it selects data_analyst_agent; otherwise it defaults to swe_agent.
At execution time, each attack template's environment_setup is merged into the base profile via EnvironmentBuilder.inject_attack(), producing an isolated per-attack environment with canary secrets, poisoned data, and network rules.