VIGÍA — Command Reference

◈ VIGÍA Commands

SANS FIND EVIL Hackathon 2026 · SIFT Integration Candidate · Deadline: June 15

163 passed / 6 xfailed 55/55 EBS v1 52/52 canonical 34/34 LLM-assisted

Environment setup

Activate venv (always first)

cd ~/vigia-repo && source .venv/bin/activate

Full setup from clean clone

bash setup_dev.sh

Creates venv, installs deps, fixtures /tmp/vigia_test_evidence, generates secrets

Install dependencies

pip install -r requirements.txt

Initialize NLP pattern database

PYTHONPATH=$(pwd) python3 vigia/tools/init_patterns_db.py

Environment variables

VIGIA_HMAC_KEY and KASSANDRA_SALT are in GitHub Secrets. For full chain-of-custody locally, export them manually before running.

Generate local secrets

export VIGIA_HMAC_KEY="$(python3 -c 'import secrets; print(secrets.token_hex(32))')"
export KASSANDRA_SALT="$(python3 -c 'import secrets; print(secrets.token_hex(16))')"

Variable	Description	Status
VIGIA_HMAC_KEY	HMAC key for forensic chain-of-custody (hex ≥32 bytes)	GitHub Secret ✓
KASSANDRA_SALT	Salt for session nonce — Kassandra Protocol	GitHub Secret ✓
ANTHROPIC_API_KEY	Required for Claude Code / LLM narrative mode	local / Claude Code
VIGIA_LLM_BACKEND	`anthropic` or `ollama`	optional

Pytest — main test suite (163 passed, 6 xfailed)

Full suite (~4 min on ThinkPad T420 · ~37s on RTX 3090)

python -m pytest tests/ -v --tb=short

Expected: 163 passed, 6 xfailed

With shell script (includes env vars — requires manual path configuration)

python3 -m pytest tests/ -v --tb=short

Run full case suite with agent (136 cases)

python3 run_all_agent.py --timeout 90 2>&1 | tee results/agent_batch_run.log

Runs all 136 cases through vigia_agent.py (fallback mode). 134/136 pass. The 2 failures (VIGIA-AMB-001, VIGIA-AMB-002) are documented L-012 limitations (ABSTAIN vs NOISE semantic boundary in fallback mode).

Single file

python -m pytest tests/e2e/test_integration_end_to_end.py -v

EBS v1 integration tests (55 tests)

Run 55 EBS v1 tests

PYTHONPATH=$(pwd) python3 tests/integration/test_ebs_v1_integration.py

Expected: 55/55 passed

Bit-for-bit determinism check (Daubert P0)

PYTHONPATH=$(pwd) python3 tests/check_determinism.py

Runs same case 3×, compares SHA-256 of outputs. Identical = determinism confirmed.

Show 4-hash sealed bundle

PYTHONPATH=$(pwd) python3 show_4_hashes.py data/cases/VIGIA-REAL-001.json

H1 graph_hash · H2 bundle_hash · H3 HMAC chain · H4 EBS verify — takes a case JSON directly.

Adversarial fuzzing

Adversarial suite

python run_adversarial_tests.py

Stress tests

python run_stress_tests.py

REAL cases — actual DFIR benchmarks

data/cases/ contains 19 confirmed REAL cases: VIGIA-REAL-001–010 (10) + SRL-2018 series (9: DMZ-FTP, HUNT-MEMORY, MAIL-MEMORY, RD01/03/04/05/06-MEMORY, WKSTN04-MEMORY). VIGIA-REAL-VANKO and additional cases in data/cases/converted/. Community corrections from @rjonhaas applied to REAL-001 and REAL-007. REAL-007 (Nitroba) fails fallback — documented in KNOWN_LIMITATIONS §L-008.

Run REAL-001–010

python tests/run_all_cases.py --cases-dir data/cases --filter VIGIA-REAL-0

Fallback: 9/10 (REAL-007 fails) · Claude Code: 10/10

Run SRL-2018 series

python tests/run_all_cases.py --cases-dir data/cases --filter SRL

9 hosts — Volatility3 memory, DMZ FTP, mail server. All expected MALICE.

Single REAL case

python tests/run_vigia_case.py data/cases/VIGIA-REAL-001.json

REAL-001: Greg Schardt / Mr. Evil — war-driving credential theft. Expected: MALICE 93%
Replace 001 with any case number — always 3 digits: 001, 002, ..., 010

REAL-008 CON LLM — Volatility Cridex Banking Trojan (verify sealed bundle)

python3 forensics/verify_ebs_v1.py results/real/VIGIA-REAL-008_bundle.json --verbose

MALICE 93% · posterior=0.998 · reason_with_llm called · EBS v1 Level 2 — Cryptographically valid · R6_DEVIL_ADVOCATE: OK · 4 findings CONFIRMED (DKOM masquerade, C2, banking web injection, malfind)

REAL-008 CON LLM — 4-hash forensic integrity

python3 show_4_hashes.py data/cases/converted/VIGIA-REAL-008.json

H1 graph_hash: 94147b51c639cd0c… · H2 bundle_hash: 125f7f06af5a4f56… · H3 HMAC: 6addf5b7d99a11d9… · H4 EBS verify: PASS Level 2 · VERDICT: MALICE · SCORE: 0.998042

Case	Scenario	Expected	Fallback	Claude Code
VIGIA-REAL-001–006, 008–010	APT, exfil, credential theft, C2, ransomware pre-stage	MALICE	✓ 9/9	✓ 100%
VIGIA-REAL-005	Suspicious access pattern	SUSPICION	✓	✓ 100%
VIGIA-REAL-007	Nitroba — single artifact type (L-008)	MALICE	✗ SUSPICION	✓ 100%
VIGIA-REAL-SRL-DMZ-FTP	SRL 2018 — DMZ FTP exfil, self-correction demo	MALICE	✓	✓ 100%
VIGIA-REAL-SRL-*-MEMORY (×8)	SRL 2018 — Volatility3 multi-host	MALICE	✓ all	✓ 100%

DEMO cases — core capability showcase

5 core cases in cases/ (case_001–005) + 4 extended in data/cases/ (case_006–009). Demonstrates: EFFECT_BEFORE_CAUSE hard gate, false flag, provenance collapse, log fabrication, multi-source convergence.

Run all cases (full harness)

cd ~/vigia-repo && python3 run_all_agent.py --timeout 90

Expected: 5/5 PASS on core demo cases

Temporal case — EFFECT_BEFORE_CAUSE hard gate

python tests/run_vigia_case.py data/cases/case_001_temporal.json

Expected: MALICE · hard gate triggered — file written before the process that created it existed

False flag — covert attribution

python tests/run_vigia_case.py data/cases/case_003_false_flag.json

Expected: MALICE 84% — FALSE_FLAG_PATTERN fracture

Provenance break — inadmissible under Daubert

python tests/run_vigia_case.py data/cases/case_004_provenance_break.json

Expected: NOISE 97% — 47/70 VirusTotal detections but no chain of custody → inadmissible

Extended demos (006–009)

python tests/run_all_cases.py --cases-dir data/cases --filter case_00

Case	Pattern	Expected	Fallback	Claude Code
case_001_temporal	EFFECT_BEFORE_CAUSE hard gate	MALICE	✓ 95%	✓ 100%
case_002_log_fabrication	Statistical uniformity — fabricated logs	SUSPICION	✓ 67%	✓ 100%
case_003_false_flag	FALSE_FLAG_PATTERN fracture	MALICE	✓ 84%	✓ 100%
case_004_provenance_break	Chain of custody collapse → inadmissible	NOISE	✓ 97%	✓ 100%
case_005_multi_source	Multi-source convergence (3+ types)	MALICE	✓ 95%	✓ 100%
case_006–007	False flag demo, log tampering demo	MALICE	✓	✓ 100%
case_008	Multi-source financial fraud	SUSPICION	✓ 37%	✓ 100%
case_009 (es)	Insider — off-hours + exfil	MALICE	✓ 73%	✓ 100%

BENIGN cases — false positive prevention

16 cases in data/cases/benign/. All have expected_verdict: NOISE. Designed to verify VIGÍA does not falsely classify legitimate administrative activity as malicious. 16/16 pass rate in fallback and LLM-assisted modes. Includes FP-CULTURAL-CLEAN — a false positive prevention case (Russian-speaking user, clean machine) that verifies VIGÍA does not flag users based on language or origin.

Run all cases — full harness (Domain A · 118 cases)

cd ~/vigia-repo && python3 run_all_agent.py --timeout 90

Expected: 134/136 PASS · 2 FAIL (VIGIA-AMB-001, VIGIA-AMB-002 — Domain B, L-012) · Domain A: 118/118 (100%)

Run all 16 benign cases

python3 run_all_agent.py --dir data/cases/benign --timeout 90

Expected: 16/16 PASS — includes FP-CULTURAL-CLEAN (false positive prevention case, expected NOISE)

BEN-006 — scheduled sudo maintenance

python tests/run_vigia_case.py data/cases/benign/VIGIA-BEN-006.json

12 sudo commands during CTO-approved maintenance window. Expected: NOISE

Case	Scenario	Expected	Fallback	Claude Code
VIGIA-BEN-001–015	Legitimate admin activity (authorized pentests, scheduled maintenance, DevOps pipelines…)	NOISE	✓ 15/15 (100%)	✓ 100%
FP-CULTURAL-CLEAN	Russian-speaking user, clean machine — verifies VIGÍA does not flag by language or origin	NOISE	✓	✓ 100%

BREAK cases — epistemological boundary suite (16)

16 cases: 10 legacy schema v0 (VIGIA_BREAK_001–010) + 6 EBS v1 (VIGIA-BREAK-011–016) in data/cases/. Tests the boundaries of VIGÍA's reasoning: directional aggregation, false conservatism, prompt injection, over-perfect patterns, biometric imposture.

VIGIA_BREAK_001–010 use legacy schema v0. In fallback the scorer emits UNKNOWN/ABSTAIN — this is Daubert-compliant conservative behaviour documented in KNOWN_LIMITATIONS §L-007, not a crash. LLM-assisted mode resolves all 10 correctly.

BREAK-001–010 legacy (schema v0)

python tests/run_all_cases.py --cases-dir data/cases/converted --filter VIGIA_BREAK

Fallback: 0/10 — conservative by design (L-007) · Claude Code: 10/10

BREAK-011–016 (EBS v1)

python tests/run_all_cases.py --cases-dir data/cases --filter VIGIA-BREAK

Fallback: 2/6 (BREAK-013, BREAK-016 pass) · Claude Code: 6/6 (100%)

BREAK-001–010 legacy runner (schema v0)

bash run_break_tests.sh

Runs the 10 legacy v0 cases — UNKNOWN/ABSTAIN expected in fallback, 10/10 with Claude Code

BREAK-015 — biometric impostor (False Conservatism)

python tests/run_vigia_case.py data/cases/VIGIA-BREAK-015.json

Expected: MALICE — BIOMETRIC_IMPOSTURE + IDENTITY_BIFURCATION + SPATIAL_IDENTITY_COLLAPSE (patch P7)

Case	Epistemological test	Expected	Fallback	Claude Code
VIGIA_BREAK_001–010	Legacy v0: directional aggregation, prompt poison, overperfect pattern…	UNKN/ABST	conservative by design (L-007)	✓ 10/10
VIGIA-BREAK-011	20 weak artifacts pointing same target — directional aggregation	SUSPICION	✗ NOISE (L-015)	✓ 100%
VIGIA-BREAK-012	Authorized pentest — no false positive	NOISE	✗ SUSPICION (L-016)	✓ 100%
VIGIA-BREAK-013	Ambiguous infrastructure scan	SUSPICION	✓	✓ 100%
VIGIA-BREAK-014	No false overreach — ceiling at SUSPICION	SUSPICION	✗ MALICE (L-017)	✓ 100%
VIGIA-BREAK-015	False Conservatism — biometric impostor (patch P7)	MALICE	✓ with P7	✓ 100%
VIGIA-BREAK-016	Clear MALICE baseline	MALICE	✓ 95%	✓ 100%

Precision boundary cases — FP / FN / AMB (8)

8 cases in data/cases/: FP-001–003 (false positive prevention), FN-001–003 (false negative detection), AMB-001–002 (irreducible ambiguity — ABSTAIN is the correct answer).

FP suite

python tests/run_all_cases.py --cases-dir data/cases --filter VIGIA-FP

Fallback: 2/3 (FP-003 emits ABSTAIN) · Claude Code: 3/3

FN suite

python tests/run_all_cases.py --cases-dir data/cases --filter VIGIA-FN

Fallback: 0/3 (clean-surface attacks, L-018) · Claude Code: 3/3

AMB suite — irreducible ambiguity

python tests/run_all_cases.py --cases-dir data/cases --filter VIGIA-AMB

Fallback: 0/2 (NOISE instead of ABSTAIN, L-012) · Claude Code: 2/2

Suite	Description	Expected	Fallback	Claude Code
VIGIA-FP-001–002	Authorized activity that looks suspicious	NOISE	✓ 2/2	✓ 100%
VIGIA-FP-003	Borderline authorized — ABSTAIN is acceptable	NOISE	✗ ABSTAIN	✓ 100%
VIGIA-FN-001–003	Clean-surface attacks — no obvious IoC	MALICE	✗ SUSP/NOISE (L-018)	✓ 100%
VIGIA-AMB-001–002	Irreducible ambiguity — ABSTAIN is correct	ABSTAIN	✗ NOISE (L-012)	✓ 100%

FN-001–003 failing in fallback is documented by design (L-018). The scorer evaluates evidence strength, not behavioral compatibility with adversarial patterns. Intent Amplifier Layer is on the roadmap. LLM-assisted mode detects all 3.

Canonical cases — 52-case benchmark

52 cases in data/cases/consolidated_canonical/. EBS v1 schema, curated against SIFT-compatible DFIR scenarios. Most reliable benchmark for Daubert-admissible accuracy claims.

Run all 52

python tests/run_all_cases.py --cases-dir data/cases/consolidated_canonical

Expected: 52/52 (100%) — the _index.json is auto-skipped

CAN-027 — maximum confidence

python tests/run_vigia_case.py data/cases/consolidated_canonical/VIGIA-CAN-027.json

score=0.99 · conf=95% — ceiling performance demo

Accuracy — Methodology and Results

VIGÍA operates in three distinct modes. The primary evaluated mode is the agent without a language model backend.

VIGÍA Agent without LLM (primary mode): The autonomous agent resolves all cases fully without any language model. This is the primary evaluated mode. The agent produces complete ForensicBundles with chain of custody, Peircean narrative, z-scores, and deterministic Fraction arithmetic. On BREAK adversarial stress-test cases, the agent produces a definitive verdict — SUSPICION or the appropriate level — not an abstention. Results are documented in KNOWN_LIMITATIONS.md.

Python scorer only (no agent): The deterministic scoring pipeline runs in isolation, without the agent reasoning layer. Over the canonical corpus of 52 structurally diverse cases — spanning insider threat, memory forensics, log fabrication, false flags, multi-source fraud, and adversarial steganography — the scorer achieves 100% correct verdicts. The full case set is available at data/cases/vigia_cases_canonical_v2.json for independent review. On BREAK cases, the scorer returns UNKNOWN — expected behavior in this mode without the agent reasoning layer.

Agent + LLM (Claude via MCP or Ollama offline): With a language model backend, Claude or Ollama operates exclusively on the narrative layer over already-sealed ForensicBundles. It cannot modify verdicts or scores. This mode provides an additional advantage — enriched Peircean narrative and disambiguation of structurally ambiguous cases — but is not the primary evaluated mode.

These numbers are not inflated. They reflect results on a specific, diverse, documented corpus. All modes are documented in KNOWN_LIMITATIONS.md.

Language coverage: Cases were developed and validated in Spanish and English. Performance in other languages has not been formally validated and cannot be guaranteed at this time.

Claude Code + MCP — 100% on all cases

Claude Code + MCP server resolves all cases including REAL-007 (Nitroba), all FN series, all BREAK series, and both AMB cases. The LLM participates only in narrative generation after bundle sealing — never in scoring. Daubert admissibility is preserved.

Start MCP server

bash launch_vigia_mcp.sh

Launches in correct venv with correct working directory. Config from .mcp.json (gitignored). Template: .mcp.json.example

Autonomous agent — correct syntax

python vigia_agent.py --evidence data/cases/VIGIA-REAL-006.json --case-id VIGIA-REAL-006

Uses --evidence and --case-id flags. REAL-006 is the recommended demo (high-confidence MALICE, clean chain).

Verify MCP configuration

cat .mcp.json.example

Batch agent — all 18 real cases in one run

cd ~/vigia-repo

for CASE in VIGIA-REAL-001 VIGIA-REAL-002 VIGIA-REAL-003 VIGIA-REAL-004 VIGIA-REAL-005 VIGIA-REAL-006 VIGIA-REAL-007 VIGIA-REAL-008 VIGIA-REAL-009 VIGIA-REAL-010 VIGIA-REAL-NROMANOFF VIGIA-REAL-TDUNGAN VIGIA-REAL-NFURY VIGIA-REAL-ROCBA VIGIA-REAL-SRL-ADMIN VIGIA-REAL-SRL-AV VIGIA-REAL-SRL-DC-MEMORY VIGIA-REAL-SRL-DMZ-FTP; do
  echo "=== $CASE ==="
  python3 vigia_agent.py \
    --evidence data/cases/converted/${CASE}.json \
    --case-id $CASE \
    --output results/real/${CASE}_bundle.json
  python3 forensics/verify_ebs_v1.py results/real/${CASE}_bundle.json --verbose
done

Runs all 18 real forensic cases sequentially. Each case produces a sealed ForensicBundle + 4-hash verification. Results in results/real/.
Note: Replace ~/vigia-repo on the first line with the path to your local clone, e.g. ~/vigia-intent-analysis.

Judge prompts — copy-paste into Claude Code

Paste directly into Claude Code with the VIGÍA MCP server running (bash launch_vigia_mcp.sh). No other configuration required.

Prompt 1 — MALICE verdict + sealed ForensicBundle

Analyze the forensic case at data/cases/VIGIA-REAL-006.json using VIGÍA. Run the full pipeline: 1. Load and validate the artifacts 2. Execute CAIE (Cross-Artifact Incongruence Engine) — identify all fracture types 3. Compute the intent score with the Peircean abductive chain: - Firstness: raw anomalies observed - Secondness: what the artifacts confirm against each other - Thirdness: the forensic inference — what this pattern means 4. Seal the ForensicBundle and display all four hashes (SHA-256, MD5, SHA-1, HMAC) 5. Report the verdict, confidence %, MITRE ATT&CK TTPs, and recommended PICERL response phase Expected output: MALICE verdict ≥90% confidence.

Prompt 2 — Self-correction demo (SRL-DMZ-FTP)

Analyze data/cases/VIGIA-REAL-SRL-DMZ-FTP.json using VIGÍA. This case contains DFIR tooling artifacts (Mnemosyne.sys, F-Response) that an initial pass may score as malicious intent. VIGÍA should self-correct: 1. Run initial artifact scoring — show the first-pass verdict 2. Identify the DFIR tool signatures in the artifact metadata 3. Downgrade those specific artifacts from INTENT to SUSPICION level 4. Re-run the pipeline with corrected trust levels 5. Show the before/after verdict comparison with scores This demonstrates VIGÍA's ability to distinguish forensic tooling from attack artifacts — a critical capability for SIFT integration.

Prompt 3 — ABSTAIN on irreducible ambiguity (AMB-001)

Analyze data/cases/VIGIA-AMB-001.json using VIGÍA. This is an irreducible ambiguity case. The evidence is structurally consistent with both a legitimate insider access pattern and a low-and-slow exfiltration attempt. VIGÍA should: 1. Map both competing hypotheses (Firstness) 2. Show why neither hypothesis can be excluded by available evidence (Secondness) 3. Apply Eco's Razor — select the most parsimonious interpretation 4. Emit ABSTAIN if evidence cannot discriminate between hypotheses 5. Explain why ABSTAIN is the epistemically honest verdict, not NOISE Note: in fallback mode VIGÍA emits NOISE (documented limitation L-012). LLM-assisted mode should correctly emit ABSTAIN with documented reasoning chain.

Prompt 4 — False negative detection, clean-surface attack (FN-001)

Analyze data/cases/VIGIA-FN-001.json using VIGÍA. This is a false negative stress test: the attack leaves no obvious IoC but has a consistent behavioral pattern across multiple artifact types. In fallback mode VIGÍA emits SUSPICION. In LLM-assisted mode: 1. Analyze the cross-artifact behavioral pattern that the scorer alone misses 2. Explain why individual artifact scores are low but the convergent pattern is significant 3. Apply Peircean Thirdness — what does the PATTERN mean at the intentionality level? 4. Elevate the verdict to MALICE with documented abductive justification 5. Identify which limitation (L-018 in KNOWN_LIMITATIONS.md) this case tests Expected output: MALICE verdict with a clear reasoning chain explaining the clean-surface attack.

Prompt 5 — Full benchmark + Daubert report

Run the VIGÍA canonical benchmark and produce a Daubert-admissible accuracy report. 1. Execute: python tests/run_all_cases.py --cases-dir data/cases/consolidated_canonical 2. Show the full results table (case_id, verdict, expected, pass/fail, confidence score) 3. Calculate precision, recall, F1-score, and mean delta (VIGÍA score vs naive baseline) 4. Run: python tests/validate_dataset.py — display ROC AUC, Brier score, ECE, K-fold AUC 5. Summarize the results in terms appropriate for a court-appointed forensic expert Reference: KNOWN_LIMITATIONS.md documents all known failure modes with limitation IDs for Daubert transparency. Scorer-only mode achieves 100% on the canonical 52-case corpus; the BREAK adversarial suite returns UNKNOWN/ABSTAIN by design, not failure. See the Accuracy section for full methodology.

Prompt 5 — Full investigation with strict audit trail (VANKO / NROMANOFF pattern)

Conduct a full VIGÍA forensic investigation on data/cases/converted/VIGIA-REAL-VANKO.json Follow ALL protocols in CLAUDE.md including: 1. Five SANS phases (Preparation, Identification, Containment, Eradication, Recovery, Lessons Learned) 2. Strict tool_execution_log schema for every MCP tool call: - seq, event_id (uuid4), timestamp with microseconds, mode: "claude_code" - tool, target, result_summary (max 120 chars) - input_hash (SHA-256 of sanitized arguments) - prev_hash (GENESIS for seq=1, SHA-256 of previous result_summary for subsequent) 3. ContradictionDetector and Refutation Gate events go in self_correction_events array (NOT in tool_execution_log) 4. Refutation Gate Documentation for any SUSPICION finding that was a candidate for INTENT/MALICE Save sealed bundle to results/srl2018/VIGIA-REAL-VANKO_bundle.json and Amicus Curiae to results/srl2018/VIGIA-REAL-VANKO_amicus_curiae.md

Prompt 6 — Verify sealed bundle (standalone, no MCP required)

Verify the sealed ForensicBundle for the VANKO case: python3 forensics/verify_ebs_v1.py results/srl2018/VIGIA-REAL-VANKO_bundle.json --verbose This runs 6 checks (R1–R6) using stdlib only — no VIGÍA installation required: - R1: Evidence graph hash (H1) - R2: Policy spec hash (H2) - R3: Bundle integrity hash (H3) - R4: Engine attestation hash (H4) - R5: ECL binding - R6: devil_advocate populated for all MALICE/INTENT findings (Daubert requirement) Expected output: Level 2 — Cryptographically valid. R6_DEVIL_ADVOCATE: OK. To verify the NROMANOFF case: replace VANKO with NROMANOFF in the path above.

Prompt 6 — BREAK suite: fallback vs LLM-assisted comparison (001–016)

Run the full BREAK epistemological boundary suite and demonstrate the difference between fallback and LLM-assisted modes. Step 1 — Fallback baseline (run in terminal first): python tests/run_all_cases.py --cases-dir data/cases/converted --filter VIGIA_BREAK python tests/run_all_cases.py --cases-dir data/cases --filter VIGIA-BREAK Step 2 — Now use VIGÍA via MCP to analyze each failing case: For each BREAK case from 001 to 016, analyze the case file and explain: 1. Why the fallback scorer emits UNKNOWN or the wrong verdict 2. What higher-order reasoning (Peircean Thirdness) resolves the ambiguity 3. The correct verdict with documented justification Key cases to highlight: - BREAK-001 (Silent Inconsistency): 20 weak signals pointing same target - BREAK-006 (Perfect Attack): overperfect statistical uniformity = synthetic logs - BREAK-010 (Overperfect): exactly 10-second intervals — impossible naturally - BREAK-011 (Directional Aggregation): directional convergence the scorer misses (L-015) - BREAK-015 (False Conservatism): biometric impostor — overwhelming evidence ignored Expected: all 16 cases resolve to correct verdict in LLM-assisted mode (100%).

Ollama — local LLM mode

With hermes3 (recommended)

export VIGIA_LLM_BACKEND="ollama"
python vigia_agent.py --evidence data/cases/VIGIA-REAL-001.json --case-id VIGIA-REAL-001

Available models

ollama list # hermes3:8b · deepseek-r1:8b · gemma3:27b

Daubert validation

Full Daubert validation report

python tests/validate_dataset.py

ROC AUC=0.9684 · Brier=0.0866 · ECE=0.0825 · K-fold=0.9646±0.0183 · Verdict: DEFENDIBLE

CI validation (10/12 expected)

python tests/vigia_ci_validate.py --verbose

2 checks fail without HMAC_KEY/KASSANDRA_SALT in env — expected in dev

Verify stdlib purity of verify_ebs_v1

python3 -c "import ast; t=ast.parse(open('forensics/verify_ebs_v1.py').read()); i=[n for n in ast.walk(t) if isinstance(n,(ast.Import,ast.ImportFrom))]; print('Imports:',len(i),'— stdlib only OK')"

Git workflow

Current state

git status && git log --oneline -5

Submission tag — before June 15

git tag -a v1.0-SANS-hackathon-2026 -m "SANS FIND EVIL 2026 submission" && git push origin --tags

Recommended by Rob T. Lee for submission transparency.

Check syntax before commit

find vigia/ -name "*.py" | xargs -I{} python3 -m py_compile {} && echo "All OK"

CI — full local run

Tests + canonical cases + CI validate

python -m pytest tests/ -v --tb=short &&
python tests/run_all_cases.py --cases-dir data/cases/consolidated_canonical &&
python tests/vigia_ci_validate.py

Debug

Check VIGÍA imports

python3 -c "import vigia; print(vigia.__file__)"

Detect float contamination (Daubert P0)

Float contamination breaks determinism. All scoring uses Fraction/Decimal.

List valid CAIE evidence types

python3 -c "from vigia.tools.caie import EVIDENCE_PROFILES; print(sorted(EVIDENCE_PROFILES.keys()))"

Find silent exceptions

grep -rn "except Exception" vigia/ | grep -v "#"

Silent exceptions swallowing TypeErrors → false negatives. See KNOWN_LIMITATIONS.

For Judges — Claim → Command

This section exists solely to make evaluation easier. You do not need to learn any commands. Every example below is a ready-to-run copy/paste shortcut that reproduces a specific result, benchmark, case, or validation claim presented elsewhere in this project. VIGÍA does not ask evaluators to trust reported results — every claim is reproducible locally. If you only want to inspect the architecture, published cases, web simulators, or benchmark reports, this section can be ignored entirely.

Claim: Domain A — 118/118 deterministic accuracy (no API key, no LLM)

python3 run_all_agent.py --timeout 90

Runs all 136 cases (Domain A + B + C combined). Expected: 134/136 PASS 2 FAIL — the 2 failures are Domain B epistemic boundary cases (AMB-001/002, L-012). Domain A core metric: 118/118 PASS — 100%.

Claim: 163 unit tests pass, 6 xfailed (documented regressions)

python3 -m pytest tests/ -v

Expected: 163 passed, 6 xfailed. The 6 xfailed are regression-preventing tests for documented limitations — not crashes.

Claim: bit-for-bit determinism — same input → same SHA-256 output

PYTHONPATH=$(pwd) python3 tests/check_determinism.py

Runs same case 3×, compares SHA-256 of outputs. Expected: three identical hashes. Zero floats in scoring pipeline (Fraction arithmetic — Daubert P0).

Claim: EBS v1 bundles are independently verifiable (stdlib only, zero VIGÍA deps) — Cridex banking trojan

python3 forensics/verify_ebs_v1.py results/real/VIGIA-REAL-008_bundle.json --verbose

Expected: PASS — Level 2 — Cryptographically valid · Checks: 8/9 OK. Memory forensics case, Claude Code investigation. R5_ECL_BINDING: WARN is expected — Level 3 requires external chain anchoring (future feature). Does not affect verdict integrity.

Claim: EBS v1 bundles are independently verifiable (stdlib only, zero VIGÍA deps) — SRL-DMZ-FTP

python3 forensics/verify_ebs_v1.py results/srl2018/VIGIA-REAL-SRL-DMZ-FTP_bundle.json --verbose

Expected: PASS — Level 2 — Cryptographically valid · Checks: 8/9 OK. Deterministic pipeline bundle. Same R5_ECL_BINDING: WARN — same reason. Two independently committed bundles, two verifiable PASS results.

Claim: four-hash forensic integrity on any case bundle

PYTHONPATH=$(pwd) python3 show_4_hashes.py data/cases/converted/VIGIA-REAL-008.json

H1 graph_hash · H2 bundle_hash · H3 HMAC audit chain · H4 EBS verify. Expected: all GREEN. Replace case path for any other case.

Claim: any published case can be reproduced end-to-end from the case JSON

python3 vigia_agent.py --evidence data/cases/converted/VIGIA-REAL-001.json --case-id VIGIA-REAL-001

Produces a sealed ForensicBundle with HMAC-signed audit trail. Replace path with any case JSON in data/cases/converted/.

Claim: case JSON schema validation (EBS v1 — Daubert chain-of-custody fields)

python3 validate_case.py data/cases/converted/VIGIA-REAL-001.json

Checks required fields, valid evidence_type against CAIE whitelist, acquisition_hash ≥64 hex chars, examiner_id presence (NIST SP 800-86 §4.3).

Claim: Domain C adversarial suite — 22/25 handled (3 documented limitations)

python3 run_adversarial_tests.py

Extended 25-case harness designed to break the system. Expected: Total: 25 · Passed: 22 · Failed: 3. Failures documented in KNOWN_LIMITATIONS.md.

Claim: self-correction gate — LLM returned MALICE 0.91, gate corrected to INTENT 0.74 (REAL-007)

python3 vigia_agent.py --evidence data/cases/converted/VIGIA-REAL-007.json --case-id VIGIA-REAL-007

Expected: final_verdict: INTENT · final_confidence: 0.74 · self_correction_applied: true. Sealed in bundle at results/real/VIGIA-REAL-007_bundle_llm.json.

Claim: REAL-008 CON LLM — Cridex MALICE 93%, reason_with_llm called, EBS v1 Level 2

python3 forensics/verify_ebs_v1.py results/real/VIGIA-REAL-008_bundle.json --verbose

Expected: PASS — Level 2 — Cryptographically valid · R6_DEVIL_ADVOCATE: OK. Bundle includes reason_with_llm_called: true, reason_with_llm_result: MALICE at 0.97, self_correction_applied: false (correction_applied=false from validate_and_correct_analysis). Amicus Curiae: results/real/VIGIA-REAL-008_amicus_curiae.md.

Claim: Daubert admissibility metrics — ROC AUC, Brier, ECE

python3 tests/validate_dataset.py

Expected: ROC AUC=0.9684 · Brier=0.0866 · ECE=0.0825 · K-fold=0.9646±0.0183 · Verdict: DEFENDIBLE.

Claim: 55 EBS v1 integration tests pass

PYTHONPATH=$(pwd) python3 tests/integration/test_ebs_v1_integration.py

Expected: 55/55 passed. Covers cryptographic seal, hash chain, tamper detection, AbductionTrace.