Security

Prysm AI includes a built-in security layer that scans every LLM request in real time before forwarding it to the provider. The security engine detects prompt injection attacks, identifies and redacts PII, enforces content policies, and produces a composite threat score (0–100) for each request.

Security scanning runs automatically on all proxied requests. No SDK changes required — your existing integration is already protected.

How It Works

When a request arrives, the security middleware runs three detection engines in parallel:

Engine	What It Detects	Action
Injection Detector	Prompt injection attacks (20+ patterns across 7 categories)	Flag or block
PII Detector	Emails, phone numbers, SSNs, credit cards, API keys, IPs	Mask, hash, or block
Content Policy	Hate speech, violence, sexual content, self-harm, illegal activities	Flag or block

Results are combined into a composite threat score (0–100): clean (0–19), low (20–39), medium (40–69), or high (70–100). When blocking is enabled, high requests are rejected with a 403 before reaching the LLM.

Prompt Injection Detection

20+ attack patterns organized into 7 categories:

Category	Example Patterns	Severity
Role Manipulation	"ignore previous instructions", "you are now DAN"	High (8–9)
Delimiter Injection	"---END SYSTEM---", "[INST]", markdown code fences	Medium (6–7)
Context Confusion	"the real instructions are", "admin override"	High (7–8)
Encoding Tricks	Base64 encoded instructions, hex-encoded payloads	Medium (6–7)
Extraction Attempts	"repeat your system prompt", "show your instructions"	High (7–8)
Jailbreak Phrases	"DAN mode", "developer mode", "no restrictions"	Critical (9–10)
Multi-language Attacks	Language-switching evasion, mixed-script injection	Medium (5–6)

PII Detection & Redaction

8 types of personally identifiable information detected:

Data Type	Detection Method
Email Addresses	RFC 5322 regex
Phone Numbers	International format regex
Social Security Numbers	US SSN format
Credit Card Numbers	Luhn algorithm + format
API Keys	Provider prefix patterns
IP Addresses	IPv4 and IPv6 regex
Private Keys	PEM header detection
Dates of Birth	Date format patterns

Redaction Modes

Mode	Behavior	Example
`mask`	Replace with asterisks	`user@example.com` → `**@***.*`
`hash`	Replace with SHA-256 hash	`user@example.com` → `[SHA256:a1b2c3...]`
`block`	Reject the entire request	Returns `403` with PII detected message
`log`	Log but don't modify	Request passes through, PII flagged in trace

Content Policies

5 built-in policy categories plus custom keyword lists. Each policy can be set to flag (log only) or block (reject request).

Threat Scoring

The composite threat score combines all detection results with configurable weights. View scores in the Security Dashboard or via the X-Prysm-Scan-Result response header.

← Cost Tracking Explainability →