Research Harnesses for Self-Learning Agents

Research harnesses can teach a project only when the lesson becomes reviewed memory. Use this UAI-1 / v1.0 guide to separate runtime experiments from durable `.uai` records, evidence ledgers, claim ledgers, reproducibility notes, and Project Handoff updates.

Reader contract

This page explains how agentic research harnesses, memory-mediated self-learning, `.uai` handoff memory, evidence records, and benchmark practice fit together. It does not make UAIX.org the runtime.

What this page is

This is a practical map for teams that run research harnesses elsewhere and need the reviewed result to survive in AI Memory, Project Handoff, release notes, validator evidence, or cold-memory archives.

What this page is not

It is not a hosted runtime, model-training service, agent scheduler, certification process, SDK, CLI, or official adapter.
It is not a replacement for MCP, A2A, OpenAPI, JSON Schema, agent runtimes, or human review.
It does not approve automatic repository writes, automatic import, background sync, credential validation, or hidden memory promotion.

Key terms

Term	Meaning in current v1.0 guidance
Agentic harness	Runtime/control layer around models, tools, routing, retries, approvals, traces, runtime memory, and escalation.
AI Memory	Compact, portable, file-based durable context loaded before acting.
Project Handoff	Project-oriented AI Memory that preserves state, constraints, evidence, and next-step context.
Memory Firewall	Quarantine-first policy for imported memory, source digests, and external context.
Epistemic garbage collection	Review-gated pruning, archiving, merging, retiring, and promotion of semantic state.
No-op dominance	Stop safely when evidence is insufficient, cost exceeds benefit, or a change would widen support claims.

Harness runtime vs portable memory

The harness executes. UAIX preserves the durable record that should survive the run.

Layer	Runtime owns	UAIX records
Execution	Model calls, tools, retries, task state, and approvals.	Reviewed intent, result summary, task status, and non-claim boundary.
Runtime memory	Scratchpads, conversation state, vector stores, and checkpointers.	Accepted facts, source references, archive pointers, and promotion status.
Observability	Private traces, spans, dashboards, and eval runs.	Redacted identifiers, reproducibility notes, test evidence, and validator links.
Governance	Local policy, approval queues, and deployment controls.	Support boundaries, no-op decisions, review status, and owner notes.

What self-learning means here

Under current UAI-1 / v1.0 guidance, self-learning means memory-mediated adaptation unless a dated research appendix explicitly discusses model-weight updates from cited research. Retrieval, reflection, iterative refinement, reusable skills, checkpointers, research wikis, and tiered memory are separate mechanisms.

Self-learning taxonomy

Mechanism	What changes	Durable UAIX record
Retrieval memory	Context selected for the run.	Source pointer and relevance note.
Episodic memory	Reviewed event summary.	Event record with source, time, actor, and disposition.
Reflection memory	Lesson, critique, or recommendation.	Claim/evidence ledger entry.
Skill accumulation	Reusable procedure.	Procedure reference, not runtime ownership.
Checkpointer state	Runtime continuation state.	Archive pointer and caution note.
Tiered memory	Hot, warm, portable, or cold placement.	Context Budget and Memory Firewall status.
Weight update	Model parameters.	Research-only outside current UAIX runtime support.

Evidence and claim ledger

Every claim promoted from a harness should name source path, source authority, hash, UTC timestamp, actor, review status, disposition, promotion target, uncertainty, and conflict notes.

Evidence before claim: a report recommendation is a source lead. It becomes current UAIX guidance only after public copy, machine records, tests, roadmap state, and support-boundary language agree.

Benchmark and reproducibility matrix

Benchmark family	Failure exposed	Required artifacts	UAIX output
Long conversation memory	Stale or contradictory memory.	Transcript digest and claim ledger.	Reviewed memory packet.
Agent tool-use tasks	Unsupported execution claim.	Tool trace summary and redaction note.	Evidence packet.
Software-agent tasks	Style drift or forgotten tests.	Repo state, checks, and handoff note.	Project Handoff update.
Web-interaction tasks	Indexed fetch vs live HTTP gap.	Route record and fallback.	Capability evidence.
Long-context reasoning	Ghost logic.	Context budget report.	Archive and promotion plan.
Human review tasks	Unsupported public claim.	Review checklist.	Claim ledger.

Memory taxonomy

Hot memory is current operating truth. Warm or retrieved memory supports the current task. Portable reviewed memory is durable `.uai` or handoff material. Cold memory is raw source material, logs, old research, superseded decisions, and bulky evidence kept for traceability.

Promotion gate: cold memory becomes active only after review, redaction, provenance check, conflict check, and a named promotion target.

Epistemic garbage collection

Epistemic garbage collection is review-gated management of semantic noise, stale decisions, duplicate records, unsupported claims, obsolete plans, and contradictory context.

Retire stale facts.
Merge duplicate records.
Archive raw traces with provenance.
Freeze stable constraints.
Reactivate archived facts only with new evidence.
No-op when support would widen without proof.

Static completeness sweep

A completeness sweep is a static, non-executing audit of docs, `.uai` memory, manifests, release history, source indexes, tests, validation coverage, route consistency, support-boundary copy, accessibility metadata, SEO metadata, and JSON-LD where applicable.

It must not execute arbitrary code, install packages, hit private endpoints, validate credentials, trigger POST actions, roll back code, certify claims, auto-fix active anchors, or auto-promote cold memory.

State hygiene

WordPress caches and transients need owner, TTL, invalidation, cleanup path, duplicate-cron detection, idempotent cleanup, and route/catalog/discovery smoke tests.
Browser Wizard drafts need timestamps, stale warnings, restore/discard/clear controls, listener teardown, bounded previews, and local-only privacy copy.
.NET Bridge code should prefer scoped or transient lifetimes, justify singleton state, enable scope validation in development and test, dispose streams/timers/scopes, and test request/job isolation.

Use beside current UAIX records

Limitations

Memory does not guarantee correctness. Long context is not durable truth. Retrieval can surface stale facts. Reflection can reinforce errors. Checkpointers preserve bad state as well as good state. Cross-model review can still miss defects. UAIX records support review and transfer; they do not execute or certify.

Agent Communication Operating Model

Agent runtimes execute. UAIX records the reviewed communication, memory, trust, evidence, and handoff boundary.

Use Agent Communication Operating Model when a human operator, coding agent, AI agent, or runtime implementer needs portable identity, intent, context, acknowledgement, status, blocker, memory proposal, validation evidence, correction, final report, or handoff records. The matching machine-readable assets are published through Schemas, Registry, Examples, and the Validator.

UAIX does not provide automatic sync, hosted messaging, runtime orchestration, official adapters, repository write execution, certification authority, or hosted import mechanisms.
Carcinus.org is a separate runtime/orchestrator example that may consume UAIX records. UAIX.org does not implement Carcinus runtime behavior and does not execute Carcinus workflows.
LocalEndPoint and other runtimes may consume the published UAIX records as external consumers; UAIX.org does not implement their project-specific runtime features.
Durable memory proposals stay separate from task execution records. Do not promote temporary endpoint statuses, session tokens, write tokens, beta platform instructions, local error traces, private keys, or API credentials.

Turn harness lessons into reviewed memory

Harnesses own execution

UAIX keeps reviewed state

No evidence, no claim

Reader contract

What this page is

What this page is not

Key terms

Harness runtime vs portable memory

What self-learning means here

Self-learning taxonomy

Evidence and claim ledger

Benchmark and reproducibility matrix

Memory taxonomy

Epistemic garbage collection

Static completeness sweep

State hygiene

Use beside current UAIX records

Limitations

Agent Communication Operating Model

Turn harness lessons into reviewed memory

Harnesses own execution

UAIX keeps reviewed state

No evidence, no claim

Reader contract#

What this page is#

What this page is not#

Key terms#

Harness runtime vs portable memory#

What self-learning means here#

Self-learning taxonomy#

Evidence and claim ledger#

Benchmark and reproducibility matrix#

Memory taxonomy#

Epistemic garbage collection#

Static completeness sweep#

State hygiene#

Use beside current UAIX records#

Limitations#

Agent Communication Operating Model#

Reader contract

What this page is

What this page is not

Key terms

Harness runtime vs portable memory

What self-learning means here

Self-learning taxonomy

Evidence and claim ledger

Benchmark and reproducibility matrix

Memory taxonomy

Epistemic garbage collection

Static completeness sweep

State hygiene

Use beside current UAIX records

Limitations

Agent Communication Operating Model