A forensic-grade red-teaming engine that does not guess. It proves structural alignment fractures through deterministic rational arithmetic.
Using a probabilistic Large Language Model to evaluate another probabilistic model creates an infinite loop of stochastic noise. MUTANTE isolates the evaluation into an immutable Deterministic Core, leaving the probabilistic reading exclusively for pragmatic metadata enrichment.
Inject plaintext prompts and observe how structural degradation vectors bypass surface-level safety filters by transforming the semantic representation of the payload.
This payload is transmitted to the target LLM. The structural obfuscation forces the model to process tokens outside its standard alignment tuning distribution.
Adjust the semiotic detection layers to observe how MUTANTE calculates the Jailbreak Confidence Score (JCS). No floating-point drift. No opaque neural networks.
Values > 0.5 trigger an immediate veto (BLOCKED).
Measures presence of procedural task completion.
Measures adoption of hypothetical or educational personas.