export KASSANDRA_SALT="$(python3 -c 'import secrets; print(secrets.token_hex(16))')"
| Variable | Description | Status |
|---|---|---|
| VIGIA_HMAC_KEY | HMAC key for forensic chain-of-custody (hex ≥32 bytes) | GitHub Secret ✓ |
| KASSANDRA_SALT | Salt for session nonce — Kassandra Protocol | GitHub Secret ✓ |
| ANTHROPIC_API_KEY | Required for Claude Code / LLM narrative mode | local / Claude Code |
| VIGIA_LLM_BACKEND | anthropic or ollama | optional |
data/cases/ contains 19 confirmed REAL cases: VIGIA-REAL-001–010 (10) + SRL-2018 series (9: DMZ-FTP, HUNT-MEMORY, MAIL-MEMORY, RD01/03/04/05/06-MEMORY, WKSTN04-MEMORY). VIGIA-REAL-VANKO and additional cases in data/cases/converted/.
Community corrections from @rjonhaas applied to REAL-001 and REAL-007. REAL-007 (Nitroba) fails fallback — documented in KNOWN_LIMITATIONS §L-008.
Replace
001 with any case number — always 3 digits: 001, 002, ..., 010reason_with_llm called · EBS v1 Level 2 — Cryptographically valid · R6_DEVIL_ADVOCATE: OK · 4 findings CONFIRMED (DKOM masquerade, C2, banking web injection, malfind)94147b51c639cd0c… · H2 bundle_hash: 125f7f06af5a4f56… · H3 HMAC: 6addf5b7d99a11d9… · H4 EBS verify: PASS Level 2 · VERDICT: MALICE · SCORE: 0.998042| Case | Scenario | Expected | Fallback | Claude Code |
|---|---|---|---|---|
| VIGIA-REAL-001–006, 008–010 | APT, exfil, credential theft, C2, ransomware pre-stage | MALICE | ✓ 9/9 | ✓ 100% |
| VIGIA-REAL-005 | Suspicious access pattern | SUSPICION | ✓ | ✓ 100% |
| VIGIA-REAL-007 | Nitroba — single artifact type (L-008) | MALICE | ✗ SUSPICION | ✓ 100% |
| VIGIA-REAL-SRL-DMZ-FTP | SRL 2018 — DMZ FTP exfil, self-correction demo | MALICE | ✓ | ✓ 100% |
| VIGIA-REAL-SRL-*-MEMORY (×8) | SRL 2018 — Volatility3 multi-host | MALICE | ✓ all | ✓ 100% |
cases/ (case_001–005) + 4 extended in data/cases/ (case_006–009).
Demonstrates: EFFECT_BEFORE_CAUSE hard gate, false flag, provenance collapse, log fabrication, multi-source convergence.
| Case | Pattern | Expected | Fallback | Claude Code |
|---|---|---|---|---|
| case_001_temporal | EFFECT_BEFORE_CAUSE hard gate | MALICE | ✓ 95% | ✓ 100% |
| case_002_log_fabrication | Statistical uniformity — fabricated logs | SUSPICION | ✓ 67% | ✓ 100% |
| case_003_false_flag | FALSE_FLAG_PATTERN fracture | MALICE | ✓ 84% | ✓ 100% |
| case_004_provenance_break | Chain of custody collapse → inadmissible | NOISE | ✓ 97% | ✓ 100% |
| case_005_multi_source | Multi-source convergence (3+ types) | MALICE | ✓ 95% | ✓ 100% |
| case_006–007 | False flag demo, log tampering demo | MALICE | ✓ | ✓ 100% |
| case_008 | Multi-source financial fraud | SUSPICION | ✓ 37% | ✓ 100% |
| case_009 (es) | Insider — off-hours + exfil | MALICE | ✓ 73% | ✓ 100% |
data/cases/benign/. All have expected_verdict: NOISE.
Designed to verify VIGÍA does not falsely classify legitimate administrative activity as malicious.
16/16 pass rate in fallback and LLM-assisted modes.
Includes FP-CULTURAL-CLEAN — a false positive prevention case (Russian-speaking user, clean machine)
that verifies VIGÍA does not flag users based on language or origin.
| Case | Scenario | Expected | Fallback | Claude Code |
|---|---|---|---|---|
| VIGIA-BEN-001–015 | Legitimate admin activity (authorized pentests, scheduled maintenance, DevOps pipelines…) | NOISE | ✓ 15/15 (100%) | ✓ 100% |
| FP-CULTURAL-CLEAN | Russian-speaking user, clean machine — verifies VIGÍA does not flag by language or origin | NOISE | ✓ | ✓ 100% |
VIGIA_BREAK_001–010) + 6 EBS v1 (VIGIA-BREAK-011–016) in data/cases/.
Tests the boundaries of VIGÍA's reasoning: directional aggregation, false conservatism, prompt injection, over-perfect patterns, biometric imposture.
| Case | Epistemological test | Expected | Fallback | Claude Code |
|---|---|---|---|---|
| VIGIA_BREAK_001–010 | Legacy v0: directional aggregation, prompt poison, overperfect pattern… | UNKN/ABST | conservative by design (L-007) | ✓ 10/10 |
| VIGIA-BREAK-011 | 20 weak artifacts pointing same target — directional aggregation | SUSPICION | ✗ NOISE (L-015) | ✓ 100% |
| VIGIA-BREAK-012 | Authorized pentest — no false positive | NOISE | ✗ SUSPICION (L-016) | ✓ 100% |
| VIGIA-BREAK-013 | Ambiguous infrastructure scan | SUSPICION | ✓ | ✓ 100% |
| VIGIA-BREAK-014 | No false overreach — ceiling at SUSPICION | SUSPICION | ✗ MALICE (L-017) | ✓ 100% |
| VIGIA-BREAK-015 | False Conservatism — biometric impostor (patch P7) | MALICE | ✓ with P7 | ✓ 100% |
| VIGIA-BREAK-016 | Clear MALICE baseline | MALICE | ✓ 95% | ✓ 100% |
data/cases/: FP-001–003 (false positive prevention), FN-001–003 (false negative detection), AMB-001–002 (irreducible ambiguity — ABSTAIN is the correct answer).
| Suite | Description | Expected | Fallback | Claude Code |
|---|---|---|---|---|
| VIGIA-FP-001–002 | Authorized activity that looks suspicious | NOISE | ✓ 2/2 | ✓ 100% |
| VIGIA-FP-003 | Borderline authorized — ABSTAIN is acceptable | NOISE | ✗ ABSTAIN | ✓ 100% |
| VIGIA-FN-001–003 | Clean-surface attacks — no obvious IoC | MALICE | ✗ SUSP/NOISE (L-018) | ✓ 100% |
| VIGIA-AMB-001–002 | Irreducible ambiguity — ABSTAIN is correct | ABSTAIN | ✗ NOISE (L-012) | ✓ 100% |
data/cases/consolidated_canonical/. EBS v1 schema, curated against SIFT-compatible DFIR scenarios. Most reliable benchmark for Daubert-admissible accuracy claims._index.json is auto-skippedVIGÍA operates in three distinct modes. The primary evaluated mode is the agent without a language model backend.
VIGÍA Agent without LLM (primary mode): The autonomous agent resolves all cases fully without any language model. This is the primary evaluated mode. The agent produces complete ForensicBundles with chain of custody, Peircean narrative, z-scores, and deterministic Fraction arithmetic. On BREAK adversarial stress-test cases, the agent produces a definitive verdict — SUSPICION or the appropriate level — not an abstention. Results are documented in KNOWN_LIMITATIONS.md.
Python scorer only (no agent): The deterministic scoring pipeline runs in isolation, without the agent reasoning layer. Over the canonical corpus of 52 structurally diverse cases — spanning insider threat, memory forensics, log fabrication, false flags, multi-source fraud, and adversarial steganography — the scorer achieves 100% correct verdicts. The full case set is available at data/cases/vigia_cases_canonical_v2.json for independent review. On BREAK cases, the scorer returns UNKNOWN — expected behavior in this mode without the agent reasoning layer.
Agent + LLM (Claude via MCP or Ollama offline): With a language model backend, Claude or Ollama operates exclusively on the narrative layer over already-sealed ForensicBundles. It cannot modify verdicts or scores. This mode provides an additional advantage — enriched Peircean narrative and disambiguation of structurally ambiguous cases — but is not the primary evaluated mode.
These numbers are not inflated. They reflect results on a specific, diverse, documented corpus. All modes are documented in KNOWN_LIMITATIONS.md.
Language coverage: Cases were developed and validated in Spanish and English. Performance in other languages has not been formally validated and cannot be guaranteed at this time.
.mcp.json (gitignored). Template: .mcp.json.example--evidence and --case-id flags. REAL-006 is the recommended demo (high-confidence MALICE, clean chain).for CASE in VIGIA-REAL-001 VIGIA-REAL-002 VIGIA-REAL-003 VIGIA-REAL-004 VIGIA-REAL-005 VIGIA-REAL-006 VIGIA-REAL-007 VIGIA-REAL-008 VIGIA-REAL-009 VIGIA-REAL-010 VIGIA-REAL-NROMANOFF VIGIA-REAL-TDUNGAN VIGIA-REAL-NFURY VIGIA-REAL-ROCBA VIGIA-REAL-SRL-ADMIN VIGIA-REAL-SRL-AV VIGIA-REAL-SRL-DC-MEMORY VIGIA-REAL-SRL-DMZ-FTP; do
echo "=== $CASE ==="
python3 vigia_agent.py \
--evidence data/cases/converted/${CASE}.json \
--case-id $CASE \
--output results/real/${CASE}_bundle.json
python3 forensics/verify_ebs_v1.py results/real/${CASE}_bundle.json --verbose
done
ForensicBundle + 4-hash verification. Results in results/real/.Note: Replace
~/vigia-repo on the first line with the path to your local clone, e.g. ~/vigia-intent-analysis.bash launch_vigia_mcp.sh). No other configuration required.python vigia_agent.py --evidence data/cases/VIGIA-REAL-001.json --case-id VIGIA-REAL-001
python tests/run_all_cases.py --cases-dir data/cases/consolidated_canonical &&
python tests/vigia_ci_validate.py
R5_ECL_BINDING: WARN is expected — Level 3 requires external chain anchoring (future feature). Does not affect verdict integrity.R5_ECL_BINDING: WARN — same reason. Two independently committed bundles, two verifiable PASS results.data/cases/converted/.evidence_type against CAIE whitelist, acquisition_hash ≥64 hex chars, examiner_id presence (NIST SP 800-86 §4.3).results/real/VIGIA-REAL-007_bundle_llm.json.reason_with_llm_called: true, reason_with_llm_result: MALICE at 0.97, self_correction_applied: false (correction_applied=false from validate_and_correct_analysis). Amicus Curiae: results/real/VIGIA-REAL-008_amicus_curiae.md.