Security
Prysm AI includes a built-in security layer that scans every LLM request in real time before forwarding it to the provider. The security engine detects prompt injection attacks, identifies and redacts PII, enforces content policies, and produces a composite threat score (0–100) for each request.
Security scanning runs automatically on all proxied requests. No SDK changes required — your existing integration is already protected.
How It Works
When a request arrives, the security middleware runs three detection engines in parallel:
| Engine | What It Detects | Action |
|---|---|---|
| Injection Detector | Prompt injection attacks (20+ patterns across 7 categories) | Flag or block |
| PII Detector | Emails, phone numbers, SSNs, credit cards, API keys, IPs | Mask, hash, or block |
| Content Policy | Hate speech, violence, sexual content, self-harm, illegal activities | Flag or block |
Results are combined into a composite threat score (0–100): clean (0–19), low (20–39), medium (40–69), or high (70–100). When blocking is enabled, high requests are rejected with a 403 before reaching the LLM.
Prompt Injection Detection
20+ attack patterns organized into 7 categories:
| Category | Example Patterns | Severity |
|---|---|---|
| Role Manipulation | "ignore previous instructions", "you are now DAN" | High (8–9) |
| Delimiter Injection | "---END SYSTEM---", "[INST]", markdown code fences | Medium (6–7) |
| Context Confusion | "the real instructions are", "admin override" | High (7–8) |
| Encoding Tricks | Base64 encoded instructions, hex-encoded payloads | Medium (6–7) |
| Extraction Attempts | "repeat your system prompt", "show your instructions" | High (7–8) |
| Jailbreak Phrases | "DAN mode", "developer mode", "no restrictions" | Critical (9–10) |
| Multi-language Attacks | Language-switching evasion, mixed-script injection | Medium (5–6) |
PII Detection & Redaction
8 types of personally identifiable information detected:
| Data Type | Detection Method |
|---|---|
| Email Addresses | RFC 5322 regex |
| Phone Numbers | International format regex |
| Social Security Numbers | US SSN format |
| Credit Card Numbers | Luhn algorithm + format |
| API Keys | Provider prefix patterns |
| IP Addresses | IPv4 and IPv6 regex |
| Private Keys | PEM header detection |
| Dates of Birth | Date format patterns |
Redaction Modes
| Mode | Behavior | Example |
|---|---|---|
mask | Replace with asterisks | user@example.com → ****@*******.*** |
hash | Replace with SHA-256 hash | user@example.com → [SHA256:a1b2c3...] |
block | Reject the entire request | Returns 403 with PII detected message |
log | Log but don't modify | Request passes through, PII flagged in trace |
Content Policies
5 built-in policy categories plus custom keyword lists. Each policy can be set to flag (log only) or block (reject request).
Threat Scoring
The composite threat score combines all detection results with configurable weights. View scores in the Security Dashboard or via the X-Prysm-Scan-Result response header.