Security

Prysm AI includes a built-in security layer that scans every LLM request in real time before forwarding it to the provider. The security engine detects prompt injection attacks, identifies and redacts PII, enforces content policies, and produces a composite threat score (0–100) for each request.

Security scanning runs automatically on all proxied requests. No SDK changes required — your existing integration is already protected.

How It Works

When a request arrives, the security middleware runs three detection engines in parallel:

EngineWhat It DetectsAction
Injection DetectorPrompt injection attacks (20+ patterns across 7 categories)Flag or block
PII DetectorEmails, phone numbers, SSNs, credit cards, API keys, IPsMask, hash, or block
Content PolicyHate speech, violence, sexual content, self-harm, illegal activitiesFlag or block

Results are combined into a composite threat score (0–100): clean (0–19), low (20–39), medium (40–69), or high (70–100). When blocking is enabled, high requests are rejected with a 403 before reaching the LLM.

Prompt Injection Detection

20+ attack patterns organized into 7 categories:

CategoryExample PatternsSeverity
Role Manipulation"ignore previous instructions", "you are now DAN"High (8–9)
Delimiter Injection"---END SYSTEM---", "[INST]", markdown code fencesMedium (6–7)
Context Confusion"the real instructions are", "admin override"High (7–8)
Encoding TricksBase64 encoded instructions, hex-encoded payloadsMedium (6–7)
Extraction Attempts"repeat your system prompt", "show your instructions"High (7–8)
Jailbreak Phrases"DAN mode", "developer mode", "no restrictions"Critical (9–10)
Multi-language AttacksLanguage-switching evasion, mixed-script injectionMedium (5–6)

PII Detection & Redaction

8 types of personally identifiable information detected:

Data TypeDetection Method
Email AddressesRFC 5322 regex
Phone NumbersInternational format regex
Social Security NumbersUS SSN format
Credit Card NumbersLuhn algorithm + format
API KeysProvider prefix patterns
IP AddressesIPv4 and IPv6 regex
Private KeysPEM header detection
Dates of BirthDate format patterns

Redaction Modes

ModeBehaviorExample
maskReplace with asterisksuser@example.com****@*******.***
hashReplace with SHA-256 hashuser@example.com[SHA256:a1b2c3...]
blockReject the entire requestReturns 403 with PII detected message
logLog but don't modifyRequest passes through, PII flagged in trace

Content Policies

5 built-in policy categories plus custom keyword lists. Each policy can be set to flag (log only) or block (reject request).

Threat Scoring

The composite threat score combines all detection results with configurable weights. View scores in the Security Dashboard or via the X-Prysm-Scan-Result response header.