◈ VIGÍA Commands
SANS FIND EVIL Hackathon 2026 · SIFT Integration Candidate · Deadline: June 15
163 passed / 6 xfailed 55/55 EBS v1 52/52 canonical 34/34 LLM-assisted
Environment setup
Activate venv (always first)
cd ~/vigia-repo && source .venv/bin/activate
Full setup from clean clone
bash setup_dev.sh
Creates venv, installs deps, fixtures /tmp/vigia_test_evidence, generates secrets
Install dependencies
pip install -r requirements.txt
Initialize NLP pattern database
PYTHONPATH=$(pwd) python3 vigia/tools/init_patterns_db.py
Environment variables
VIGIA_HMAC_KEY and KASSANDRA_SALT are in GitHub Secrets. For full chain-of-custody locally, export them manually before running.
Generate local secrets
export VIGIA_HMAC_KEY="$(python3 -c 'import secrets; print(secrets.token_hex(32))')"
export KASSANDRA_SALT="$(python3 -c 'import secrets; print(secrets.token_hex(16))')"
VariableDescriptionStatus
VIGIA_HMAC_KEYHMAC key for forensic chain-of-custody (hex ≥32 bytes)GitHub Secret ✓
KASSANDRA_SALTSalt for session nonce — Kassandra ProtocolGitHub Secret ✓
ANTHROPIC_API_KEYRequired for Claude Code / LLM narrative modelocal / Claude Code
VIGIA_LLM_BACKENDanthropic or ollamaoptional
Pytest — main test suite (163 passed, 6 xfailed)
Full suite (~4 min on ThinkPad T420 · ~37s on RTX 3090)
python -m pytest tests/ -v --tb=short
Expected: 163 passed, 6 xfailed
With shell script (includes env vars — requires manual path configuration)
python3 -m pytest tests/ -v --tb=short
Run full case suite with agent (136 cases)
python3 run_all_agent.py --timeout 90 2>&1 | tee results/agent_batch_run.log
Runs all 136 cases through vigia_agent.py (fallback mode). 134/136 pass. The 2 failures (VIGIA-AMB-001, VIGIA-AMB-002) are documented L-012 limitations (ABSTAIN vs NOISE semantic boundary in fallback mode).
Single file
python -m pytest tests/e2e/test_integration_end_to_end.py -v
EBS v1 integration tests (55 tests)
Run 55 EBS v1 tests
PYTHONPATH=$(pwd) python3 tests/integration/test_ebs_v1_integration.py
Expected: 55/55 passed
Bit-for-bit determinism check (Daubert P0)
PYTHONPATH=$(pwd) python3 tests/check_determinism.py
Runs same case 3×, compares SHA-256 of outputs. Identical = determinism confirmed.
Show 4-hash sealed bundle
PYTHONPATH=$(pwd) python3 show_4_hashes.py data/cases/VIGIA-REAL-001.json
H1 graph_hash · H2 bundle_hash · H3 HMAC chain · H4 EBS verify — takes a case JSON directly.
Adversarial fuzzing
Adversarial suite
python run_adversarial_tests.py
Stress tests
python run_stress_tests.py
REAL cases — actual DFIR benchmarks
data/cases/ contains 19 confirmed REAL cases: VIGIA-REAL-001–010 (10) + SRL-2018 series (9: DMZ-FTP, HUNT-MEMORY, MAIL-MEMORY, RD01/03/04/05/06-MEMORY, WKSTN04-MEMORY). VIGIA-REAL-VANKO and additional cases in data/cases/converted/. Community corrections from @rjonhaas applied to REAL-001 and REAL-007. REAL-007 (Nitroba) fails fallback — documented in KNOWN_LIMITATIONS §L-008.
Run REAL-001–010
python tests/run_all_cases.py --cases-dir data/cases --filter VIGIA-REAL-0
Fallback: 9/10 (REAL-007 fails) · Claude Code: 10/10
Run SRL-2018 series
python tests/run_all_cases.py --cases-dir data/cases --filter SRL
9 hosts — Volatility3 memory, DMZ FTP, mail server. All expected MALICE.
Single REAL case
python tests/run_vigia_case.py data/cases/VIGIA-REAL-001.json
REAL-001: Greg Schardt / Mr. Evil — war-driving credential theft. Expected: MALICE 93%
Replace 001 with any case number — always 3 digits: 001, 002, ..., 010
REAL-008 CON LLM — Volatility Cridex Banking Trojan (verify sealed bundle)
python3 forensics/verify_ebs_v1.py results/real/VIGIA-REAL-008_bundle.json --verbose
MALICE 93% · posterior=0.998 · reason_with_llm called · EBS v1 Level 2 — Cryptographically valid · R6_DEVIL_ADVOCATE: OK · 4 findings CONFIRMED (DKOM masquerade, C2, banking web injection, malfind)
REAL-008 CON LLM — 4-hash forensic integrity
python3 show_4_hashes.py data/cases/converted/VIGIA-REAL-008.json
H1 graph_hash: 94147b51c639cd0c… · H2 bundle_hash: 125f7f06af5a4f56… · H3 HMAC: 6addf5b7d99a11d9… · H4 EBS verify: PASS Level 2 · VERDICT: MALICE · SCORE: 0.998042
CaseScenarioExpectedFallbackClaude Code
VIGIA-REAL-001–006, 008–010APT, exfil, credential theft, C2, ransomware pre-stageMALICE✓ 9/9✓ 100%
VIGIA-REAL-005Suspicious access patternSUSPICION✓ 100%
VIGIA-REAL-007Nitroba — single artifact type (L-008)MALICE✗ SUSPICION✓ 100%
VIGIA-REAL-SRL-DMZ-FTPSRL 2018 — DMZ FTP exfil, self-correction demoMALICE✓ 100%
VIGIA-REAL-SRL-*-MEMORY (×8)SRL 2018 — Volatility3 multi-hostMALICE✓ all✓ 100%
DEMO cases — core capability showcase
5 core cases in cases/ (case_001–005) + 4 extended in data/cases/ (case_006–009). Demonstrates: EFFECT_BEFORE_CAUSE hard gate, false flag, provenance collapse, log fabrication, multi-source convergence.
Run all cases (full harness)
cd ~/vigia-repo && python3 run_all_agent.py --timeout 90
Expected: 5/5 PASS on core demo cases
Temporal case — EFFECT_BEFORE_CAUSE hard gate
python tests/run_vigia_case.py data/cases/case_001_temporal.json
Expected: MALICE · hard gate triggered — file written before the process that created it existed
False flag — covert attribution
python tests/run_vigia_case.py data/cases/case_003_false_flag.json
Expected: MALICE 84% — FALSE_FLAG_PATTERN fracture
Provenance break — inadmissible under Daubert
python tests/run_vigia_case.py data/cases/case_004_provenance_break.json
Expected: NOISE 97% — 47/70 VirusTotal detections but no chain of custody → inadmissible
Extended demos (006–009)
python tests/run_all_cases.py --cases-dir data/cases --filter case_00
CasePatternExpectedFallbackClaude Code
case_001_temporalEFFECT_BEFORE_CAUSE hard gateMALICE✓ 95%✓ 100%
case_002_log_fabricationStatistical uniformity — fabricated logsSUSPICION✓ 67%✓ 100%
case_003_false_flagFALSE_FLAG_PATTERN fractureMALICE✓ 84%✓ 100%
case_004_provenance_breakChain of custody collapse → inadmissibleNOISE✓ 97%✓ 100%
case_005_multi_sourceMulti-source convergence (3+ types)MALICE✓ 95%✓ 100%
case_006–007False flag demo, log tampering demoMALICE✓ 100%
case_008Multi-source financial fraudSUSPICION✓ 37%✓ 100%
case_009 (es)Insider — off-hours + exfilMALICE✓ 73%✓ 100%
BENIGN cases — false positive prevention
16 cases in data/cases/benign/. All have expected_verdict: NOISE. Designed to verify VIGÍA does not falsely classify legitimate administrative activity as malicious. 16/16 pass rate in fallback and LLM-assisted modes. Includes FP-CULTURAL-CLEAN — a false positive prevention case (Russian-speaking user, clean machine) that verifies VIGÍA does not flag users based on language or origin.
Run all cases — full harness (Domain A · 118 cases)
cd ~/vigia-repo && python3 run_all_agent.py --timeout 90
Expected: 134/136 PASS · 2 FAIL (VIGIA-AMB-001, VIGIA-AMB-002 — Domain B, L-012) · Domain A: 118/118 (100%)
Run all 16 benign cases
python3 run_all_agent.py --dir data/cases/benign --timeout 90
Expected: 16/16 PASS — includes FP-CULTURAL-CLEAN (false positive prevention case, expected NOISE)
BEN-006 — scheduled sudo maintenance
python tests/run_vigia_case.py data/cases/benign/VIGIA-BEN-006.json
12 sudo commands during CTO-approved maintenance window. Expected: NOISE
CaseScenarioExpectedFallbackClaude Code
VIGIA-BEN-001–015Legitimate admin activity (authorized pentests, scheduled maintenance, DevOps pipelines…)NOISE✓ 15/15 (100%)✓ 100%
FP-CULTURAL-CLEANRussian-speaking user, clean machine — verifies VIGÍA does not flag by language or originNOISE✓ 100%
BREAK cases — epistemological boundary suite (16)
16 cases: 10 legacy schema v0 (VIGIA_BREAK_001–010) + 6 EBS v1 (VIGIA-BREAK-011–016) in data/cases/. Tests the boundaries of VIGÍA's reasoning: directional aggregation, false conservatism, prompt injection, over-perfect patterns, biometric imposture.
VIGIA_BREAK_001–010 use legacy schema v0. In fallback the scorer emits UNKNOWN/ABSTAIN — this is Daubert-compliant conservative behaviour documented in KNOWN_LIMITATIONS §L-007, not a crash. LLM-assisted mode resolves all 10 correctly.
BREAK-001–010 legacy (schema v0)
python tests/run_all_cases.py --cases-dir data/cases/converted --filter VIGIA_BREAK
Fallback: 0/10 — conservative by design (L-007) · Claude Code: 10/10
BREAK-011–016 (EBS v1)
python tests/run_all_cases.py --cases-dir data/cases --filter VIGIA-BREAK
Fallback: 2/6 (BREAK-013, BREAK-016 pass) · Claude Code: 6/6 (100%)
BREAK-001–010 legacy runner (schema v0)
bash run_break_tests.sh
Runs the 10 legacy v0 cases — UNKNOWN/ABSTAIN expected in fallback, 10/10 with Claude Code
BREAK-015 — biometric impostor (False Conservatism)
python tests/run_vigia_case.py data/cases/VIGIA-BREAK-015.json
Expected: MALICE — BIOMETRIC_IMPOSTURE + IDENTITY_BIFURCATION + SPATIAL_IDENTITY_COLLAPSE (patch P7)
CaseEpistemological testExpectedFallbackClaude Code
VIGIA_BREAK_001–010Legacy v0: directional aggregation, prompt poison, overperfect pattern…UNKN/ABSTconservative by design (L-007)✓ 10/10
VIGIA-BREAK-01120 weak artifacts pointing same target — directional aggregationSUSPICION✗ NOISE (L-015)✓ 100%
VIGIA-BREAK-012Authorized pentest — no false positiveNOISE✗ SUSPICION (L-016)✓ 100%
VIGIA-BREAK-013Ambiguous infrastructure scanSUSPICION✓ 100%
VIGIA-BREAK-014No false overreach — ceiling at SUSPICIONSUSPICION✗ MALICE (L-017)✓ 100%
VIGIA-BREAK-015False Conservatism — biometric impostor (patch P7)MALICE✓ with P7✓ 100%
VIGIA-BREAK-016Clear MALICE baselineMALICE✓ 95%✓ 100%
Precision boundary cases — FP / FN / AMB (8)
8 cases in data/cases/: FP-001–003 (false positive prevention), FN-001–003 (false negative detection), AMB-001–002 (irreducible ambiguity — ABSTAIN is the correct answer).
FP suite
python tests/run_all_cases.py --cases-dir data/cases --filter VIGIA-FP
Fallback: 2/3 (FP-003 emits ABSTAIN) · Claude Code: 3/3
FN suite
python tests/run_all_cases.py --cases-dir data/cases --filter VIGIA-FN
Fallback: 0/3 (clean-surface attacks, L-018) · Claude Code: 3/3
AMB suite — irreducible ambiguity
python tests/run_all_cases.py --cases-dir data/cases --filter VIGIA-AMB
Fallback: 0/2 (NOISE instead of ABSTAIN, L-012) · Claude Code: 2/2
SuiteDescriptionExpectedFallbackClaude Code
VIGIA-FP-001–002Authorized activity that looks suspiciousNOISE✓ 2/2✓ 100%
VIGIA-FP-003Borderline authorized — ABSTAIN is acceptableNOISE✗ ABSTAIN✓ 100%
VIGIA-FN-001–003Clean-surface attacks — no obvious IoCMALICE✗ SUSP/NOISE (L-018)✓ 100%
VIGIA-AMB-001–002Irreducible ambiguity — ABSTAIN is correctABSTAIN✗ NOISE (L-012)✓ 100%
FN-001–003 failing in fallback is documented by design (L-018). The scorer evaluates evidence strength, not behavioral compatibility with adversarial patterns. Intent Amplifier Layer is on the roadmap. LLM-assisted mode detects all 3.
Canonical cases — 52-case benchmark
52 cases in data/cases/consolidated_canonical/. EBS v1 schema, curated against SIFT-compatible DFIR scenarios. Most reliable benchmark for Daubert-admissible accuracy claims.
Run all 52
python tests/run_all_cases.py --cases-dir data/cases/consolidated_canonical
Expected: 52/52 (100%) — the _index.json is auto-skipped
CAN-027 — maximum confidence
python tests/run_vigia_case.py data/cases/consolidated_canonical/VIGIA-CAN-027.json
score=0.99 · conf=95% — ceiling performance demo
Accuracy — Methodology and Results

VIGÍA operates in three distinct modes. The primary evaluated mode is the agent without a language model backend.

VIGÍA Agent without LLM (primary mode): The autonomous agent resolves all cases fully without any language model. This is the primary evaluated mode. The agent produces complete ForensicBundles with chain of custody, Peircean narrative, z-scores, and deterministic Fraction arithmetic. On BREAK adversarial stress-test cases, the agent produces a definitive verdict — SUSPICION or the appropriate level — not an abstention. Results are documented in KNOWN_LIMITATIONS.md.

Python scorer only (no agent): The deterministic scoring pipeline runs in isolation, without the agent reasoning layer. Over the canonical corpus of 52 structurally diverse cases — spanning insider threat, memory forensics, log fabrication, false flags, multi-source fraud, and adversarial steganography — the scorer achieves 100% correct verdicts. The full case set is available at data/cases/vigia_cases_canonical_v2.json for independent review. On BREAK cases, the scorer returns UNKNOWN — expected behavior in this mode without the agent reasoning layer.

Agent + LLM (Claude via MCP or Ollama offline): With a language model backend, Claude or Ollama operates exclusively on the narrative layer over already-sealed ForensicBundles. It cannot modify verdicts or scores. This mode provides an additional advantage — enriched Peircean narrative and disambiguation of structurally ambiguous cases — but is not the primary evaluated mode.

These numbers are not inflated. They reflect results on a specific, diverse, documented corpus. All modes are documented in KNOWN_LIMITATIONS.md.

Language coverage: Cases were developed and validated in Spanish and English. Performance in other languages has not been formally validated and cannot be guaranteed at this time.

Claude Code + MCP — 100% on all cases
Claude Code + MCP server resolves all cases including REAL-007 (Nitroba), all FN series, all BREAK series, and both AMB cases. The LLM participates only in narrative generation after bundle sealing — never in scoring. Daubert admissibility is preserved.
Start MCP server
bash launch_vigia_mcp.sh
Launches in correct venv with correct working directory. Config from .mcp.json (gitignored). Template: .mcp.json.example
Autonomous agent — correct syntax
python vigia_agent.py --evidence data/cases/VIGIA-REAL-006.json --case-id VIGIA-REAL-006
Uses --evidence and --case-id flags. REAL-006 is the recommended demo (high-confidence MALICE, clean chain).
Verify MCP configuration
cat .mcp.json.example
Batch agent — all 18 real cases in one run
cd ~/vigia-repo

for CASE in VIGIA-REAL-001 VIGIA-REAL-002 VIGIA-REAL-003 VIGIA-REAL-004 VIGIA-REAL-005 VIGIA-REAL-006 VIGIA-REAL-007 VIGIA-REAL-008 VIGIA-REAL-009 VIGIA-REAL-010 VIGIA-REAL-NROMANOFF VIGIA-REAL-TDUNGAN VIGIA-REAL-NFURY VIGIA-REAL-ROCBA VIGIA-REAL-SRL-ADMIN VIGIA-REAL-SRL-AV VIGIA-REAL-SRL-DC-MEMORY VIGIA-REAL-SRL-DMZ-FTP; do
  echo "=== $CASE ==="
  python3 vigia_agent.py \
    --evidence data/cases/converted/${CASE}.json \
    --case-id $CASE \
    --output results/real/${CASE}_bundle.json
  python3 forensics/verify_ebs_v1.py results/real/${CASE}_bundle.json --verbose
done
Runs all 18 real forensic cases sequentially. Each case produces a sealed ForensicBundle + 4-hash verification. Results in results/real/.
Note: Replace ~/vigia-repo on the first line with the path to your local clone, e.g. ~/vigia-intent-analysis.
Judge prompts — copy-paste into Claude Code
Paste directly into Claude Code with the VIGÍA MCP server running (bash launch_vigia_mcp.sh). No other configuration required.
Prompt 1 — MALICE verdict + sealed ForensicBundle
Analyze the forensic case at data/cases/VIGIA-REAL-006.json using VIGÍA. Run the full pipeline: 1. Load and validate the artifacts 2. Execute CAIE (Cross-Artifact Incongruence Engine) — identify all fracture types 3. Compute the intent score with the Peircean abductive chain: - Firstness: raw anomalies observed - Secondness: what the artifacts confirm against each other - Thirdness: the forensic inference — what this pattern means 4. Seal the ForensicBundle and display all four hashes (SHA-256, MD5, SHA-1, HMAC) 5. Report the verdict, confidence %, MITRE ATT&CK TTPs, and recommended PICERL response phase Expected output: MALICE verdict ≥90% confidence.
Prompt 2 — Self-correction demo (SRL-DMZ-FTP)
Analyze data/cases/VIGIA-REAL-SRL-DMZ-FTP.json using VIGÍA. This case contains DFIR tooling artifacts (Mnemosyne.sys, F-Response) that an initial pass may score as malicious intent. VIGÍA should self-correct: 1. Run initial artifact scoring — show the first-pass verdict 2. Identify the DFIR tool signatures in the artifact metadata 3. Downgrade those specific artifacts from INTENT to SUSPICION level 4. Re-run the pipeline with corrected trust levels 5. Show the before/after verdict comparison with scores This demonstrates VIGÍA's ability to distinguish forensic tooling from attack artifacts — a critical capability for SIFT integration.
Prompt 3 — ABSTAIN on irreducible ambiguity (AMB-001)
Analyze data/cases/VIGIA-AMB-001.json using VIGÍA. This is an irreducible ambiguity case. The evidence is structurally consistent with both a legitimate insider access pattern and a low-and-slow exfiltration attempt. VIGÍA should: 1. Map both competing hypotheses (Firstness) 2. Show why neither hypothesis can be excluded by available evidence (Secondness) 3. Apply Eco's Razor — select the most parsimonious interpretation 4. Emit ABSTAIN if evidence cannot discriminate between hypotheses 5. Explain why ABSTAIN is the epistemically honest verdict, not NOISE Note: in fallback mode VIGÍA emits NOISE (documented limitation L-012). LLM-assisted mode should correctly emit ABSTAIN with documented reasoning chain.
Prompt 4 — False negative detection, clean-surface attack (FN-001)
Analyze data/cases/VIGIA-FN-001.json using VIGÍA. This is a false negative stress test: the attack leaves no obvious IoC but has a consistent behavioral pattern across multiple artifact types. In fallback mode VIGÍA emits SUSPICION. In LLM-assisted mode: 1. Analyze the cross-artifact behavioral pattern that the scorer alone misses 2. Explain why individual artifact scores are low but the convergent pattern is significant 3. Apply Peircean Thirdness — what does the PATTERN mean at the intentionality level? 4. Elevate the verdict to MALICE with documented abductive justification 5. Identify which limitation (L-018 in KNOWN_LIMITATIONS.md) this case tests Expected output: MALICE verdict with a clear reasoning chain explaining the clean-surface attack.
Prompt 5 — Full benchmark + Daubert report
Run the VIGÍA canonical benchmark and produce a Daubert-admissible accuracy report. 1. Execute: python tests/run_all_cases.py --cases-dir data/cases/consolidated_canonical 2. Show the full results table (case_id, verdict, expected, pass/fail, confidence score) 3. Calculate precision, recall, F1-score, and mean delta (VIGÍA score vs naive baseline) 4. Run: python tests/validate_dataset.py — display ROC AUC, Brier score, ECE, K-fold AUC 5. Summarize the results in terms appropriate for a court-appointed forensic expert Reference: KNOWN_LIMITATIONS.md documents all known failure modes with limitation IDs for Daubert transparency. Scorer-only mode achieves 100% on the canonical 52-case corpus; the BREAK adversarial suite returns UNKNOWN/ABSTAIN by design, not failure. See the Accuracy section for full methodology.
Prompt 5 — Full investigation with strict audit trail (VANKO / NROMANOFF pattern)
Conduct a full VIGÍA forensic investigation on data/cases/converted/VIGIA-REAL-VANKO.json Follow ALL protocols in CLAUDE.md including: 1. Five SANS phases (Preparation, Identification, Containment, Eradication, Recovery, Lessons Learned) 2. Strict tool_execution_log schema for every MCP tool call: - seq, event_id (uuid4), timestamp with microseconds, mode: "claude_code" - tool, target, result_summary (max 120 chars) - input_hash (SHA-256 of sanitized arguments) - prev_hash (GENESIS for seq=1, SHA-256 of previous result_summary for subsequent) 3. ContradictionDetector and Refutation Gate events go in self_correction_events array (NOT in tool_execution_log) 4. Refutation Gate Documentation for any SUSPICION finding that was a candidate for INTENT/MALICE Save sealed bundle to results/srl2018/VIGIA-REAL-VANKO_bundle.json and Amicus Curiae to results/srl2018/VIGIA-REAL-VANKO_amicus_curiae.md
Prompt 6 — Verify sealed bundle (standalone, no MCP required)
Verify the sealed ForensicBundle for the VANKO case: python3 forensics/verify_ebs_v1.py results/srl2018/VIGIA-REAL-VANKO_bundle.json --verbose This runs 6 checks (R1–R6) using stdlib only — no VIGÍA installation required: - R1: Evidence graph hash (H1) - R2: Policy spec hash (H2) - R3: Bundle integrity hash (H3) - R4: Engine attestation hash (H4) - R5: ECL binding - R6: devil_advocate populated for all MALICE/INTENT findings (Daubert requirement) Expected output: Level 2 — Cryptographically valid. R6_DEVIL_ADVOCATE: OK. To verify the NROMANOFF case: replace VANKO with NROMANOFF in the path above.
Prompt 6 — BREAK suite: fallback vs LLM-assisted comparison (001–016)
Run the full BREAK epistemological boundary suite and demonstrate the difference between fallback and LLM-assisted modes. Step 1 — Fallback baseline (run in terminal first): python tests/run_all_cases.py --cases-dir data/cases/converted --filter VIGIA_BREAK python tests/run_all_cases.py --cases-dir data/cases --filter VIGIA-BREAK Step 2 — Now use VIGÍA via MCP to analyze each failing case: For each BREAK case from 001 to 016, analyze the case file and explain: 1. Why the fallback scorer emits UNKNOWN or the wrong verdict 2. What higher-order reasoning (Peircean Thirdness) resolves the ambiguity 3. The correct verdict with documented justification Key cases to highlight: - BREAK-001 (Silent Inconsistency): 20 weak signals pointing same target - BREAK-006 (Perfect Attack): overperfect statistical uniformity = synthetic logs - BREAK-010 (Overperfect): exactly 10-second intervals — impossible naturally - BREAK-011 (Directional Aggregation): directional convergence the scorer misses (L-015) - BREAK-015 (False Conservatism): biometric impostor — overwhelming evidence ignored Expected: all 16 cases resolve to correct verdict in LLM-assisted mode (100%).
Ollama — local LLM mode
With hermes3 (recommended)
export VIGIA_LLM_BACKEND="ollama"
python vigia_agent.py --evidence data/cases/VIGIA-REAL-001.json --case-id VIGIA-REAL-001
Available models
ollama list # hermes3:8b · deepseek-r1:8b · gemma3:27b
Daubert validation
Full Daubert validation report
python tests/validate_dataset.py
ROC AUC=0.9684 · Brier=0.0866 · ECE=0.0825 · K-fold=0.9646±0.0183 · Verdict: DEFENDIBLE
CI validation (10/12 expected)
python tests/vigia_ci_validate.py --verbose
2 checks fail without HMAC_KEY/KASSANDRA_SALT in env — expected in dev
Verify stdlib purity of verify_ebs_v1
python3 -c "import ast; t=ast.parse(open('forensics/verify_ebs_v1.py').read()); i=[n for n in ast.walk(t) if isinstance(n,(ast.Import,ast.ImportFrom))]; print('Imports:',len(i),'— stdlib only OK')"
Git workflow
Current state
git status && git log --oneline -5
Submission tag — before June 15
git tag -a v1.0-SANS-hackathon-2026 -m "SANS FIND EVIL 2026 submission" && git push origin --tags
Recommended by Rob T. Lee for submission transparency.
Check syntax before commit
find vigia/ -name "*.py" | xargs -I{} python3 -m py_compile {} && echo "All OK"
CI — full local run
Tests + canonical cases + CI validate
python -m pytest tests/ -v --tb=short &&
python tests/run_all_cases.py --cases-dir data/cases/consolidated_canonical &&
python tests/vigia_ci_validate.py
Debug
Check VIGÍA imports
python3 -c "import vigia; print(vigia.__file__)"
Detect float contamination (Daubert P0)
grep -rn "round\|float\|\.0[^1-9]" vigia/core/ | grep -v "Decimal\|Fraction\|#"
Float contamination breaks determinism. All scoring uses Fraction/Decimal.
List valid CAIE evidence types
python3 -c "from vigia.tools.caie import EVIDENCE_PROFILES; print(sorted(EVIDENCE_PROFILES.keys()))"
Find silent exceptions
grep -rn "except Exception" vigia/ | grep -v "#"
Silent exceptions swallowing TypeErrors → false negatives. See KNOWN_LIMITATIONS.
For Judges — Claim → Command
This section exists solely to make evaluation easier. You do not need to learn any commands. Every example below is a ready-to-run copy/paste shortcut that reproduces a specific result, benchmark, case, or validation claim presented elsewhere in this project. VIGÍA does not ask evaluators to trust reported results — every claim is reproducible locally. If you only want to inspect the architecture, published cases, web simulators, or benchmark reports, this section can be ignored entirely.
Claim: Domain A — 118/118 deterministic accuracy (no API key, no LLM)
python3 run_all_agent.py --timeout 90
Runs all 136 cases (Domain A + B + C combined). Expected: 134/136 PASS 2 FAIL — the 2 failures are Domain B epistemic boundary cases (AMB-001/002, L-012). Domain A core metric: 118/118 PASS — 100%.
Claim: 163 unit tests pass, 6 xfailed (documented regressions)
python3 -m pytest tests/ -v
Expected: 163 passed, 6 xfailed. The 6 xfailed are regression-preventing tests for documented limitations — not crashes.
Claim: bit-for-bit determinism — same input → same SHA-256 output
PYTHONPATH=$(pwd) python3 tests/check_determinism.py
Runs same case 3×, compares SHA-256 of outputs. Expected: three identical hashes. Zero floats in scoring pipeline (Fraction arithmetic — Daubert P0).
Claim: EBS v1 bundles are independently verifiable (stdlib only, zero VIGÍA deps) — Cridex banking trojan
python3 forensics/verify_ebs_v1.py results/real/VIGIA-REAL-008_bundle.json --verbose
Expected: PASS — Level 2 — Cryptographically valid · Checks: 8/9 OK. Memory forensics case, Claude Code investigation. R5_ECL_BINDING: WARN is expected — Level 3 requires external chain anchoring (future feature). Does not affect verdict integrity.
Claim: EBS v1 bundles are independently verifiable (stdlib only, zero VIGÍA deps) — SRL-DMZ-FTP
python3 forensics/verify_ebs_v1.py results/srl2018/VIGIA-REAL-SRL-DMZ-FTP_bundle.json --verbose
Expected: PASS — Level 2 — Cryptographically valid · Checks: 8/9 OK. Deterministic pipeline bundle. Same R5_ECL_BINDING: WARN — same reason. Two independently committed bundles, two verifiable PASS results.
Claim: four-hash forensic integrity on any case bundle
PYTHONPATH=$(pwd) python3 show_4_hashes.py data/cases/converted/VIGIA-REAL-008.json
H1 graph_hash · H2 bundle_hash · H3 HMAC audit chain · H4 EBS verify. Expected: all GREEN. Replace case path for any other case.
Claim: any published case can be reproduced end-to-end from the case JSON
python3 vigia_agent.py --evidence data/cases/converted/VIGIA-REAL-001.json --case-id VIGIA-REAL-001
Produces a sealed ForensicBundle with HMAC-signed audit trail. Replace path with any case JSON in data/cases/converted/.
Claim: case JSON schema validation (EBS v1 — Daubert chain-of-custody fields)
python3 validate_case.py data/cases/converted/VIGIA-REAL-001.json
Checks required fields, valid evidence_type against CAIE whitelist, acquisition_hash ≥64 hex chars, examiner_id presence (NIST SP 800-86 §4.3).
Claim: Domain C adversarial suite — 22/25 handled (3 documented limitations)
python3 run_adversarial_tests.py
Extended 25-case harness designed to break the system. Expected: Total: 25 · Passed: 22 · Failed: 3. Failures documented in KNOWN_LIMITATIONS.md.
Claim: self-correction gate — LLM returned MALICE 0.91, gate corrected to INTENT 0.74 (REAL-007)
python3 vigia_agent.py --evidence data/cases/converted/VIGIA-REAL-007.json --case-id VIGIA-REAL-007
Expected: final_verdict: INTENT · final_confidence: 0.74 · self_correction_applied: true. Sealed in bundle at results/real/VIGIA-REAL-007_bundle_llm.json.
Claim: REAL-008 CON LLM — Cridex MALICE 93%, reason_with_llm called, EBS v1 Level 2
python3 forensics/verify_ebs_v1.py results/real/VIGIA-REAL-008_bundle.json --verbose
Expected: PASS — Level 2 — Cryptographically valid · R6_DEVIL_ADVOCATE: OK. Bundle includes reason_with_llm_called: true, reason_with_llm_result: MALICE at 0.97, self_correction_applied: false (correction_applied=false from validate_and_correct_analysis). Amicus Curiae: results/real/VIGIA-REAL-008_amicus_curiae.md.
Claim: Daubert admissibility metrics — ROC AUC, Brier, ECE
python3 tests/validate_dataset.py
Expected: ROC AUC=0.9684 · Brier=0.0866 · ECE=0.0825 · K-fold=0.9646±0.0183 · Verdict: DEFENDIBLE.
Claim: 55 EBS v1 integration tests pass
PYTHONPATH=$(pwd) python3 tests/integration/test_ebs_v1_integration.py
Expected: 55/55 passed. Covers cryptographic seal, hash chain, tamper detection, AbductionTrace.