three-questions	Three Questions for Agentic Autonomy	Universal intake sequence. Human capability boundary first, autonomy blockers se	autonomy automation intake workflow blocker access subtractive-security methodology capability-boundary assessment
delegation-vs-inline	Delegation vs Inline Execution	Delegated parallel execution across 3 nodes completed identical tasks in 48% of 	delegation parallel execution multi-agent context-window sequential fresh-context speedup wall-clock
interaction-mode-variance	Interaction Mode Variance	Passenger mode consumes 41x more tokens than governor mode. 88 sessions, single 	passenger governor tokens cost human-attention mode 41x ratio original
authorization-gap	Delegated Agent Authorization Gap	Agents fail at auth boundaries, not capability boundaries. The blocker is infras	authorization agent failure auth OAuth MFA browser-redirect automation-blocker infrastructure headless
governance-binding	Governance Binding	Governance binds at N=30. 81% overall, reflex-dependent, delivery-path sensitive	governance binding adapter reflex compliance weights delivery-path system-prompt SmolLM alignment
interaction-mode-rerun	Interaction Mode Rerun	52.7x passenger/governor token ratio under pre-committed rubric. N=501 qualifyin	passenger governor tokens cost replication methodology pre-committed rubric 52x validation
http2-vs-http1	HTTP/2 vs HTTP/1.1	HTTP/2 multiplexing delivers 2.1x throughput for local inference concurrency. 70	HTTP/2 HTTP/1.1 multiplexing llama-server local-inference concurrency throughput protocol Apple-Silicon
1bit-quantization	1-Bit Quantization	1-bit quantization breaks the 8GB deployment ceiling. Bonsai 8B runs on 8GB M2 M	1-bit quantization 8GB 8B deployment local-inference ceiling RAM Q4 Q8
1bit-hardware-tiers	1-Bit Across Hardware Tiers	1-bit wins across 4 hardware tiers because the bottleneck differs per tier. 8GB 	1-bit Q4 Q8 quantization-comparison hardware-tiers memory-bandwidth compute bottleneck 8GB 64GB scaling
throughput-ceiling	Throughput Ceiling	Aggregate inference throughput plateaus at a hardware-determined ceiling. Adding	throughput ceiling concurrency saturation memory-bandwidth scaling plateau optimal-concurrency Apple-Silicon
review-vs-verification	Review vs Verification	Model code review and functional verification have asymmetric yield. Cheaper mod	code-review verification crash effort-inversion cheaper-model C functional-testing three-gate
lookdown-routing	Lookdown Routing	grep beats inference for known-answer retrieval. A TSV lookup returns the correc	routing grep TSV lookup known-answer fast-path skip-inference deterministic retrieval
manifest-vs-bm25	Manifest vs BM25	Manifest-routed retrieval beats BM25 for small-corpus QA. The routing table know	retrieval BM25 RAG small-corpus curated routing document-selection human-curated
governance-refusal	Governance Refusal	In-vivo reproduction of finding 05 during scorer calibration. The model refused 	governance refusal production real-world unprompted adapter validation emergent alignment
reflex-binding	Reflex Binding	Abstract reflexes transfer only with lineage, not instruction. 6 models, 2 arms,	binding lineage instruction fine-tune weights LoRA system-prompt trained prompted disposition alignment
effort-dependent-binding	Effort-Dependent Binding	Governance binding varies by effort tier. 4 frontier models, 4-6 effort levels, 	effort tier extended-thinking compliance governance alignment non-monotonic compute-budget reasoning
handler-substrate-selection	Handler Substrate Selection	Three-gate methodology for picking which model runs handler.sh. 240 trials, 3 mo	handler model-selection tool-call function-calling dispatch gate nano confabulation small-model
native-tools	Native Tools vs Primitives	9 Claude Code features tested against Unix primitives. 6 resolved: 2 primitive wins, 1 justified, 1 marginally justified, 1 invalidated, 1 not justified	native-tools primitives Claude-Code effort MCP loop skills review simplify wrapper Unix
variance-lab-methodology	Variance Lab Methodology	How variance-lab measures local LLM reliability. 5 task classes, 3 tiers, 300 Anthropic baseline inferences, governance binding rubric from 12 probes at N=100	methodology variance-lab harness baselines task-class tier governance-rubric probe binding reproducibility
throughput-decay	Multi-Pass Throughput Decay	Local inference throughput drops 33-49% after sustained sequential inference. 4 models, 5 passes, 100 inferences on M5 Max. Likely thermal	throughput thermal throttling decay multi-pass inference local Apple-Silicon memory-bandwidth sequential sustained
moe-governance	OLMoE Governance Fidelity	OLMoE-v7b governed 96% vs bare 71%. Behavioral 100% vs 51%. c-list-files regresses 48pp under governance. 2400 trials	MoE mixture-of-experts governance binding OLMoE behavioral compliance expert-routing fidelity sparse
adapter-amplification	Adapter Amplification	69 LoRA examples lift SmolLM2 1.7B from 56% bare to 97.5% adapted (with prompt). Weights alone reach 65%. f-confab 16% to 100%	adapter LoRA amplification governance binding SmolLM2 weights training contrastive behavioral small-model
cross-model-adapter-sweep	Cross-Model Adapter Sweep	Same 69 training examples across 25 models from 135M to 7B. OLMoE 100%. Qwen 0.5B 95%. Per-architecture LoRA	adapter sweep cross-model LoRA governance binding architecture family transfer 25-model behavioral