Guides

Research Harnesses for Self-Learning Agents

UAI-1 / v1.0 guidance for research harnesses, memory-mediated self-learning, .uai memory strategy, evidence ledgers, reproducibility, static completeness sweeps, and support boundaries.

  • Record UAIX-DOC-3191
  • Path /es-us/guides/research-harnesses-self-learning/
  • Use Canonical public record

Document status

Public standards page Published on UAIX as part of the current public standards record
Code
UAIX-DOC-3191
Surface
Guides
Access
Public and linkable

How to use this page

Use this guide to record research-harness outcomes, memory-mediated self-learning, evidence ledgers, reproducibility notes, static completeness sweeps, and .uai memory strategy without making UAIX a runtime.

Use beside

Agentic Harness StrategiesAI MemoryProject HandoffContext Budget

Research Harnesses

Turn harness lessons into reviewed memory

Use this guide to capture memory-mediated self-learning, evidence ledgers, claim ledgers, reproducibility notes, static completeness sweeps, and state hygiene without making UAIX the runtime.

Runtime

Harnesses own execution

Tool calls, retries, approvals, traces, scratchpads, and checkpointers stay in the research harness or runtime layer.

Memory

UAIX keeps reviewed state

Only accepted facts, evidence pointers, claim status, tests, blockers, and next actions belong in hot .uai memory.

Review

No evidence, no claim

Source leads become current guidance only after page copy, machine records, tests, roadmap state, and support boundaries agree.

Use beside

Agentic Harness StrategiesBroader runtime-versus-evidence split.AI MemoryCompact reviewed memory.Project HandoffDurable project state after the run.Context BudgetHot and cold memory placement.Memory FirewallQuarantine imported memory before promotion.ValidatorValidate candidate UAI-1 packets before claims.
Promotion RuleEvidence before claim
A research-harness result becomes UAIX memory only after review, redaction, provenance check, conflict check, and a named promotion target.

Use this line when reviewing harness-derived guidance, reports, or generated memory.

Research harnesses can teach a project only when the lesson becomes reviewed memory. Use this UAI-1 / v1.0 guide to separate runtime experiments from durable `.uai` records, evidence ledgers, claim ledgers, reproducibility notes, and Project Handoff updates.

Reader contract

This page explains how agentic research harnesses, memory-mediated self-learning, `.uai` handoff memory, evidence records, and benchmark practice fit together. It does not make UAIX.org the runtime.

What this page is

This is a practical map for teams that run research harnesses elsewhere and need the reviewed result to survive in AI Memory, Project Handoff, release notes, validator evidence, or cold-memory archives.

What this page is not

  • It is not a hosted runtime, model-training service, agent scheduler, certification process, SDK, CLI, or official adapter.
  • It is not a replacement for MCP, A2A, OpenAPI, JSON Schema, agent runtimes, or human review.
  • It does not approve automatic repository writes, automatic import, background sync, credential validation, or hidden memory promotion.

Key terms

Term Meaning in current v1.0 guidance
Agentic harness Runtime/control layer around models, tools, routing, retries, approvals, traces, runtime memory, and escalation.
AI Memory Compact, portable, file-based durable context loaded before acting.
Project Handoff Project-oriented AI Memory that preserves state, constraints, evidence, and next-step context.
Memory Firewall Quarantine-first policy for imported memory, source digests, and external context.
Epistemic garbage collection Review-gated pruning, archiving, merging, retiring, and promotion of semantic state.
No-op dominance Stop safely when evidence is insufficient, cost exceeds benefit, or a change would widen support claims.

Harness runtime vs portable memory

The harness executes. UAIX preserves the durable record that should survive the run.

Layer Runtime owns UAIX records
Execution Model calls, tools, retries, task state, and approvals. Reviewed intent, result summary, task status, and non-claim boundary.
Runtime memory Scratchpads, conversation state, vector stores, and checkpointers. Accepted facts, source references, archive pointers, and promotion status.
Observability Private traces, spans, dashboards, and eval runs. Redacted identifiers, reproducibility notes, test evidence, and validator links.
Governance Local policy, approval queues, and deployment controls. Support boundaries, no-op decisions, review status, and owner notes.

What self-learning means here

Under current UAI-1 / v1.0 guidance, self-learning means memory-mediated adaptation unless a dated research appendix explicitly discusses model-weight updates from cited research. Retrieval, reflection, iterative refinement, reusable skills, checkpointers, research wikis, and tiered memory are separate mechanisms.

Self-learning taxonomy

Mechanism What changes Durable UAIX record
Retrieval memory Context selected for the run. Source pointer and relevance note.
Episodic memory Reviewed event summary. Event record with source, time, actor, and disposition.
Reflection memory Lesson, critique, or recommendation. Claim/evidence ledger entry.
Skill accumulation Reusable procedure. Procedure reference, not runtime ownership.
Checkpointer state Runtime continuation state. Archive pointer and caution note.
Tiered memory Hot, warm, portable, or cold placement. Context Budget and Memory Firewall status.
Weight update Model parameters. Research-only outside current UAIX runtime support.

Evidence and claim ledger

Every claim promoted from a harness should name source path, source authority, hash, UTC timestamp, actor, review status, disposition, promotion target, uncertainty, and conflict notes.

Evidence before claim: a report recommendation is a source lead. It becomes current UAIX guidance only after public copy, machine records, tests, roadmap state, and support-boundary language agree.

Benchmark and reproducibility matrix

Benchmark family Failure exposed Required artifacts UAIX output
Long conversation memory Stale or contradictory memory. Transcript digest and claim ledger. Reviewed memory packet.
Agent tool-use tasks Unsupported execution claim. Tool trace summary and redaction note. Evidence packet.
Software-agent tasks Style drift or forgotten tests. Repo state, checks, and handoff note. Project Handoff update.
Web-interaction tasks Indexed fetch vs live HTTP gap. Route record and fallback. Capability evidence.
Long-context reasoning Ghost logic. Context budget report. Archive and promotion plan.
Human review tasks Unsupported public claim. Review checklist. Claim ledger.

Memory taxonomy

Hot memory is current operating truth. Warm or retrieved memory supports the current task. Portable reviewed memory is durable `.uai` or handoff material. Cold memory is raw source material, logs, old research, superseded decisions, and bulky evidence kept for traceability.

Promotion gate: cold memory becomes active only after review, redaction, provenance check, conflict check, and a named promotion target.

Epistemic garbage collection

Epistemic garbage collection is review-gated management of semantic noise, stale decisions, duplicate records, unsupported claims, obsolete plans, and contradictory context.

  • Retire stale facts.
  • Merge duplicate records.
  • Archive raw traces with provenance.
  • Freeze stable constraints.
  • Reactivate archived facts only with new evidence.
  • No-op when support would widen without proof.

Static completeness sweep

A completeness sweep is a static, non-executing audit of docs, `.uai` memory, manifests, release history, source indexes, tests, validation coverage, route consistency, support-boundary copy, accessibility metadata, SEO metadata, and JSON-LD where applicable.

It must not execute arbitrary code, install packages, hit private endpoints, validate credentials, trigger POST actions, roll back code, certify claims, auto-fix active anchors, or auto-promote cold memory.

State hygiene

  • WordPress caches and transients need owner, TTL, invalidation, cleanup path, duplicate-cron detection, idempotent cleanup, and route/catalog/discovery smoke tests.
  • Browser Wizard drafts need timestamps, stale warnings, restore/discard/clear controls, listener teardown, bounded previews, and local-only privacy copy.
  • .NET Bridge code should prefer scoped or transient lifetimes, justify singleton state, enable scope validation in development and test, dispose streams/timers/scopes, and test request/job isolation.

Use beside current UAIX records

Limitations

Memory does not guarantee correctness. Long context is not durable truth. Retrieval can surface stale facts. Reflection can reinforce errors. Checkpointers preserve bad state as well as good state. Cross-model review can still miss defects. UAIX records support review and transfer; they do not execute or certify.