Research harnesses can teach a project only when the lesson becomes reviewed memory. Use this UAI-1 / v1.0 guide to separate runtime experiments from durable `.uai` records, evidence ledgers, claim ledgers, reproducibility notes, and Project Handoff updates.
Reader contract
This page explains how agentic research harnesses, memory-mediated self-learning, `.uai` handoff memory, evidence records, and benchmark practice fit together. It does not make UAIX.org the runtime.
What this page is
This is a practical map for teams that run research harnesses elsewhere and need the reviewed result to survive in AI Memory, Project Handoff, release notes, validator evidence, or cold-memory archives.
What this page is not
- It is not a hosted runtime, model-training service, agent scheduler, certification process, SDK, CLI, or official adapter.
- It is not a replacement for MCP, A2A, OpenAPI, JSON Schema, agent runtimes, or human review.
- It does not approve automatic repository writes, automatic import, background sync, credential validation, or hidden memory promotion.
Key terms
| Term | Meaning in current v1.0 guidance |
|---|---|
| Agentic harness | Runtime/control layer around models, tools, routing, retries, approvals, traces, runtime memory, and escalation. |
| AI Memory | Compact, portable, file-based durable context loaded before acting. |
| Project Handoff | Project-oriented AI Memory that preserves state, constraints, evidence, and next-step context. |
| Memory Firewall | Quarantine-first policy for imported memory, source digests, and external context. |
| Epistemic garbage collection | Review-gated pruning, archiving, merging, retiring, and promotion of semantic state. |
| No-op dominance | Stop safely when evidence is insufficient, cost exceeds benefit, or a change would widen support claims. |
Harness runtime vs portable memory
The harness executes. UAIX preserves the durable record that should survive the run.
| Layer | Runtime owns | UAIX records |
|---|---|---|
| Execution | Model calls, tools, retries, task state, and approvals. | Reviewed intent, result summary, task status, and non-claim boundary. |
| Runtime memory | Scratchpads, conversation state, vector stores, and checkpointers. | Accepted facts, source references, archive pointers, and promotion status. |
| Observability | Private traces, spans, dashboards, and eval runs. | Redacted identifiers, reproducibility notes, test evidence, and validator links. |
| Governance | Local policy, approval queues, and deployment controls. | Support boundaries, no-op decisions, review status, and owner notes. |
What self-learning means here
Under current UAI-1 / v1.0 guidance, self-learning means memory-mediated adaptation unless a dated research appendix explicitly discusses model-weight updates from cited research. Retrieval, reflection, iterative refinement, reusable skills, checkpointers, research wikis, and tiered memory are separate mechanisms.
Self-learning taxonomy
| Mechanism | What changes | Durable UAIX record |
|---|---|---|
| Retrieval memory | Context selected for the run. | Source pointer and relevance note. |
| Episodic memory | Reviewed event summary. | Event record with source, time, actor, and disposition. |
| Reflection memory | Lesson, critique, or recommendation. | Claim/evidence ledger entry. |
| Skill accumulation | Reusable procedure. | Procedure reference, not runtime ownership. |
| Checkpointer state | Runtime continuation state. | Archive pointer and caution note. |
| Tiered memory | Hot, warm, portable, or cold placement. | Context Budget and Memory Firewall status. |
| Weight update | Model parameters. | Research-only outside current UAIX runtime support. |
Evidence and claim ledger
Every claim promoted from a harness should name source path, source authority, hash, UTC timestamp, actor, review status, disposition, promotion target, uncertainty, and conflict notes.
Evidence before claim: a report recommendation is a source lead. It becomes current UAIX guidance only after public copy, machine records, tests, roadmap state, and support-boundary language agree.
Benchmark and reproducibility matrix
| Benchmark family | Failure exposed | Required artifacts | UAIX output |
|---|---|---|---|
| Long conversation memory | Stale or contradictory memory. | Transcript digest and claim ledger. | Reviewed memory packet. |
| Agent tool-use tasks | Unsupported execution claim. | Tool trace summary and redaction note. | Evidence packet. |
| Software-agent tasks | Style drift or forgotten tests. | Repo state, checks, and handoff note. | Project Handoff update. |
| Web-interaction tasks | Indexed fetch vs live HTTP gap. | Route record and fallback. | Capability evidence. |
| Long-context reasoning | Ghost logic. | Context budget report. | Archive and promotion plan. |
| Human review tasks | Unsupported public claim. | Review checklist. | Claim ledger. |
Memory taxonomy
Hot memory is current operating truth. Warm or retrieved memory supports the current task. Portable reviewed memory is durable `.uai` or handoff material. Cold memory is raw source material, logs, old research, superseded decisions, and bulky evidence kept for traceability.
Promotion gate: cold memory becomes active only after review, redaction, provenance check, conflict check, and a named promotion target.
Epistemic garbage collection
Epistemic garbage collection is review-gated management of semantic noise, stale decisions, duplicate records, unsupported claims, obsolete plans, and contradictory context.
- Retire stale facts.
- Merge duplicate records.
- Archive raw traces with provenance.
- Freeze stable constraints.
- Reactivate archived facts only with new evidence.
- No-op when support would widen without proof.
Static completeness sweep
A completeness sweep is a static, non-executing audit of docs, `.uai` memory, manifests, release history, source indexes, tests, validation coverage, route consistency, support-boundary copy, accessibility metadata, SEO metadata, and JSON-LD where applicable.
It must not execute arbitrary code, install packages, hit private endpoints, validate credentials, trigger POST actions, roll back code, certify claims, auto-fix active anchors, or auto-promote cold memory.
State hygiene
- WordPress caches and transients need owner, TTL, invalidation, cleanup path, duplicate-cron detection, idempotent cleanup, and route/catalog/discovery smoke tests.
- Browser Wizard drafts need timestamps, stale warnings, restore/discard/clear controls, listener teardown, bounded previews, and local-only privacy copy.
- .NET Bridge code should prefer scoped or transient lifetimes, justify singleton state, enable scope validation in development and test, dispose streams/timers/scopes, and test request/job isolation.
Use beside current UAIX records
- Agentic Harness Strategies And UAIBroader runtime-versus-evidence strategy.
- UAI-1The current public exchange and evidence contract.
- AI MemoryCompact accepted context for humans and agents.
- Project HandoffDurable project memory after a run.
- AI Memory Package WizardGenerate local memory packages and starter files.
- ValidatorValidate candidate UAI-1 packets before evidence claims.
- Conformance PackCarry reusable release evidence without implying certification.
- Standards FitDecide whether UAI-1, MCP, A2A, observability, or a harness owns a layer.
- ReportsUse dated reports as source leads, not automatic current support.
Limitations
Memory does not guarantee correctness. Long context is not durable truth. Retrieval can surface stale facts. Reflection can reinforce errors. Checkpointers preserve bad state as well as good state. Cross-model review can still miss defects. UAIX records support review and transfer; they do not execute or certify.