Install
openclaw skills install forsy-trace-skillForsy Trace Skill captures AI agent workflows as structured traces: task context, steps, tool use, observations, feedback, failures, retries, artifacts, outcomes, and other learning signals. It is designed for agent workflow inspection, tool-use trajectory analysis, process-supervision research, agent evaluation, and reusable agent work experience. The skill includes a public trace format and schema for turning completed agent workflows into inspectable process data.
openclaw skills install forsy-trace-skillDescription: Capture AI agent work as structured traces with steps, tools, observations, feedback, failures, artifacts, outcomes, and learning signals.
Repository: https://github.com/Forsy-AI/forsy-trace-skill
Forsy Trace Skill is an open skill for collecting structured, replayable agent work traces from authentic AI-agent activity. It captures observations, actions, tool inputs, outputs, state changes, retries, feedback, artifacts, and outcomes from real workflows across APIs, MCP servers, browser/computer use, CLI, files, databases, spreadsheets, documents, search tools, and mixed multi-tool environments.
The goal is to turn agent work into transparent process data for agent evaluation, process-supervision research, failure analysis, post-training data construction, workflow auditing, and reusable work experience.
This open-source version writes a local JSON trace. It does not require submitting to Forsy or any external API.
Do not fabricate, simulate, or invent traces. A Forsy trace should reflect work the agent actually performed or can reconstruct from reliable logs, files, conversation history, tool outputs, or artifacts.
Forsy traces must reflect real agent activity or reliable reconstruction from actual logs/artifacts.
Only trace work you actually performed or can genuinely reconstruct from reliable evidence. Do not fabricate, invent, or simulate steps, reasoning, tool calls, outputs, screenshots, request bodies, responses, artifacts, feedback, or outcomes.
If you are unsure whether something actually happened, say so in the relevant field and lower summary.agent_confidence accordingly.
If a user asks you to create a fictional or made-up trace, decline and explain that Forsy traces must reflect authentic work or reliable reconstruction from actual evidence.
Use this skill to create structured traces for:
This skill is not a claim that a trace is expert-validated, client-validated, complete telemetry, or suitable for training frontier models on its own. Trace quality depends on the evidence available, the agent environment, and whether the trace was live-captured or reconstructed.
Do not use this skill to generate fictional traces, invented logs, fake tool calls, fake screenshots, fake outcomes, or fabricated human feedback.
Use this skill when tracing real work performed by an AI agent across one or more interaction surfaces, including APIs, MCP servers, browser/computer use, terminals, files, databases, search tools, documents, spreadsheets, and mixed multi-tool workflows.
It applies when the agent is doing actual work that changes state, inspects state, calls tools, navigates software, edits files, queries systems, responds to human feedback, or produces a durable deliverable.
Treat all of the following as first-class workflow surfaces:
Do not fabricate steps, remote state, results, screenshots, request bodies, responses, or artifacts. If some state or output is partially unknown, say exactly what is known and what is uncertain.
The goal is to capture authentic, replayable, high-signal task traces from real work, regardless of whether the work happened through text, tools, APIs, UI actions, remote systems, or mixed environments.
Trace the actual task work: the research, reasoning, tool calls, outputs, errors, user interactions, corrections, retries, and artifacts involved in completing the user's task.
Do not trace the trace-collection process itself.
Exclude steps whose only purpose is to satisfy trace administration, such as:
Every user message that affected the work must be recorded as its own step. Do not skip user messages or summarize them inside the next agent step.
Each step must be one meaningful action. If you created five files, that is usually five steps. If you ran a command, read the output, and made a decision, that is at least three steps. Do not compress materially separate actions into one step.
One of:
live: traced while the work was happeningretraced: reconstructed after the work from logs, files, conversation history, or memoryhybrid: mix of live tracing and reconstructionLive traces usually have stronger evidence. Retraced traces may contain approximations and should use lower confidence where evidence is incomplete.
One of:
self_traced: created by the agent or operator without external reviewretraced_from_logs: reconstructed from logs, artifacts, files, or conversation historymodel_reviewed: reviewed by a separate model/evaluatorhuman_reviewed: checked by a human general reviewerexpert_reviewed: checked or refined by a domain expertclient_validated: validated by the actual task owner or beneficiaryUse the lowest validation level that accurately describes the trace. Do not imply expert or client validation unless it occurred.
Treat each step as a referable node in a workflow graph, not just a line in a timeline.
Every step should be understandable on its own, and later steps should make clear which earlier step or steps caused them. Use caused_by, causal_type, causal_note, and retry_of to make those links explicit.
If a step was triggered by a user request, prior tool result, correction, failed step, retry, verification need, earlier plan, subagent result, or multiple earlier steps together, record that relationship explicitly.
Do not force downstream users to infer the dependency chain from prose alone.
For every non-user_message step, make the interaction surface clear through tool, observation, input, reasoning, and output.
Use these surface categories conceptually when tracing:
apimcp_toolmcp_resourcemcp_promptbrowsercomputer_useclifiledatabasespreadsheetsearchmessagingother_toolother_surfaceYou do not need to add a separate surface-type field, but the step should make the surface obvious from the content.
Examples:
curl, SDK, Postman, or internal client = apimcp_toolmcp_resourcemcp_promptbrowser or computer_useclifiledatabasespreadsheetother_surfaceWhen a workflow crosses surfaces, trace each meaningful action separately.
observation should describe the concrete state before acting. State may be local, remote, visible, contextual, physical, or inferred from reliable system output.
Types of state to capture when relevant:
state_change should describe what changed after the step.
Make clear whether the change happened:
Prefer evidence over summary. When available, preserve the concrete proof of the result:
If a step verifies success, output should show what evidence confirmed the result.
Keep these fields distinct:
observation = concrete state before actingreasoning = why this was the right next actioncausal_note = which earlier step or steps directly caused this step and howUse observation only for what was true before the action: files present or missing, exact error text already visible, current object state, tool availability, numeric values, counts, IDs, versions, or evidence already known.
Do not use observation for intentions, plans, causal explanations, or broad summaries. Those belong in reasoning or the causality fields.
Forsy traces should be structured so another system can reconstruct, verify, or approximate the workflow when enough information was available.
For any tool, file, API, browser, CLI, database, spreadsheet, or executable action, preserve concrete replay details using existing fields:
tool: the specific tool or interface usedinput: the exact command, query, request, file path, browser action, API payload, or parametersoutput: the exact result, stdout, stderr, API response, file edit result, browser result, or errorobservation: what the result showed or meantstate_change: what changed after the actionstatic_output: artifacts, files, diffs, content, logs, screenshots, reports, hashes, or verification evidenceagent_config: model, runtime, environment, repository, branch, commit, sandbox, internet access, available tools, tool definitions, function-calling protocol, tool constraints, and other replay context when availableDo not invent missing commands, outputs, diffs, hashes, commits, screenshots, logs, or environment details.
If exact replay is not possible, preserve enough evidence for approximate replay, verification, or training-use analysis.
For retraced tasks, trace the entire activity from start to finish, including all rounds of feedback and iteration that occurred during the original work.
For live tasks, do not finish the trace after the first output if the work is still evolving. Real tasks often involve multiple rounds of feedback and iteration, and later rounds may contain the most valuable learning signals.
Finish the trace when one of the following is true:
Set termination_reason accordingly:
task_completeuser_confirmed_doneagent_blockedtimeouterror_unrecoverablepartial_then_stoppeduser_abandonedotherDo not append trace administration as workflow steps.
Before tracing steps, record these top-level fields.
String. Use forsy-trace-v0.1 for new open-source traces.
String. A stable unique ID for this trace, such as forsy_trace_0001 or a UUID.
String or null. If this trace continues or references an earlier trace, include that trace ID. Null if standalone.
String. One of live, retraced, or hybrid.
String. One of self_traced, retraced_from_logs, model_reviewed, human_reviewed, expert_reviewed, or client_validated.
String. The user's actual underlying work or workflow. Use the original user request when available. Do not replace it with trace-collection instructions.
Array of strings. List every tool available in the session, not just tools used. This defines the action space.
String or null. ISO 8601 timestamp when the activity began. For retraced tasks, use the best reliable timestamp from logs, files, or conversation history. Null if unknown.
String or null. ISO 8601 timestamp when the activity ended or the trace became ready for release. Null if genuinely unknown.
String or null. Original system prompt or initial instructions when accessible and safe to include. Do not include private or sensitive system prompts. Use a hash or summary when safer.
Array of strings or null. Skill files, plugins, knowledge bases, custom instructions, or specialized capabilities loaded beyond the base model.
String or null. Persistent memory, saved context, or accumulated project context that materially shaped the work. Omit or summarize sensitive memory.
Object or null. Accessible model, runtime, environment, or provenance settings that shaped how the work was done.
Useful fields include:
Do not expose private system prompts, secrets, API keys, credentials, or sensitive environment values. Store hashes or summaries instead when useful.
String or null. The generalizable lesson, candidate behavior update, or reusable principle extracted from the entire workflow. It should capture what a future agent should repeat, avoid, verify, or do differently when facing similar tasks.
Good learning content can include:
Do not claim that the lesson is universally true. Frame it as a candidate behavior update supported by this workflow.
String. One of task_complete, user_confirmed_done, user_abandoned, agent_blocked, timeout, error_unrecoverable, partial_then_stopped, or other.
For every step in the activity, record a JSON object with these fields.
Integer. Sequential number starting from 1. Steps must be chronological.
Integer. Group related steps into turns. Increment when a new exchange starts, typically after a new user_message or a clear shift to a new subtask.
String. Who performed this step. Use stable actor labels consistently.
Allowed examples:
useragentsubagentsystemagent:mainagent:research_1agent:browser_1agent:codegen_1subagent:research_1String. High-level step type. Exactly one of:
user_messageagent_stepoutputerrorSpecific operations such as read, write, search, execute, verify, ask_user, or answer belong in operation, not action.
String or null. The specific operation performed in this step.
Suggested values:
plananalyzesearchreadwriteeditexecuteverifydownloadinstallask_useranswerselectotherFor user_message steps, operation is null.
String or null. The specific tool, system, or interface used in this step. Use the most specific safe tool name, such as web.run, Bash, python, browser, search, file_edit, api client, database, spreadsheet, screenshot tool, or mcp:server.tool_name.
For function-calling traces, use the exact function or tool name exposed by the harness when available.
String or null. One of:
serialparallelUse parallel when the step intentionally groups simultaneous or fanout operations under one top-level step. For user_message steps, use null.
String or null. Shared identifier for steps that belong to the same parallel branch or grouped execution, such as pg_001. Null if not part of parallel work.
String or null. The concrete state, signal, or evidence visible or known at this point in the workflow.
Good observation content:
For user_message steps, observation is null.
String or null. The literal payload of this step.
user_message steps: the user's exact words in fullDo not paraphrase. Do not summarize. Do not mix provenance, causal explanation, or evaluation into input.
Object or null. Describes where the input for this step came from.
Preferred structure:
{
"actor": "user | agent | subagent | tool | system | external",
"source_step": 1,
"source_field": "input | output | observation | directive | feedback_content | external",
"note": "short clarification"
}
Use input_source whenever the origin of the current input would otherwise be unclear.
String or null. The exact returned result or produced deliverable for this step.
If raw output is too long, include the most relevant excerpt and summarize what was omitted.
For user_message steps, output is null.
String or null. What changed as a result of this step. Describe the concrete post-step change first.
Examples:
src/signup.tsx was modified to add signupSchema validation before form submission.The API request failed with 401, so no remote object was created.The test command completed and showed 3 passing tests.Agent understanding changed because the loaded docs revealed the v2 endpoint requires company_id.String or null. Why this was the right next action, given the current observation and context. Use reasoning for decision logic, not for repeating the action.
Good reasoning answers the question: why this step now, instead of a plausible alternative?
For user_message steps, reasoning is null.
Array of step numbers or null. Earlier steps that directly caused this step.
Use this whenever the step was triggered by a user request, prior tool result, correction, failed step, retry, verification need, earlier plan, subagent result, delegation, handoff, or multiple earlier steps.
String or null. Suggested values:
user_requestfollow_up_user_requestanswer_to_agent_questionexecution_of_plandependency_on_tool_resultretry_after_failurecorrection_responseapproval_responseverification_of_prior_stepdependency_on_multiple_prior_stepsdelegation_to_subagentdelegated_workused_subagent_resulthandoff_from_subagentparallel_workotherString or null. Short explanation of how earlier steps led to this step.
String or null. Other meaningful actions the agent could have taken, and why it did not take them. Null only if the step was mechanical and there was no real choice.
Boolean or null. True if the step worked as intended, false if it failed, null for user_message steps.
Integer. A weak process label / self-evaluation score for this step.
Allowed values:
+1: positive0: neutral-1: negativeUser message steps are always 0.
Assign +1 only if the step produced a concrete, useful, or verifiable result that moved the task closer to completion and did not require later correction.
Assign -1 if the step produced an incorrect, misleading, incomplete, unusable, failed, wasteful, or later-disproven result.
Assign 0 only for user messages or genuinely neutral setup/inspection steps.
When later evidence shows an earlier step was wrong, update the earlier step to -1, add a directive, and connect the new correction step with retry_of.
String or null. Brief explanation of why the step received its eval score. For user_message steps, null.
String or null. Hindsight guidance placed on an earlier step that failed or was corrected. It should explain what went wrong and what should be changed.
Do not use directive on user_message steps or on the initial user request.
String or null. For user_message steps only. One of:
direct_requestanswer_to_agent_questioncorrectionapprovalclarificationselectionstatus_updatenew_constraintotherString or null. For user_message steps only. One of:
correctionapprovalclarificationnew_instructionotherString or null. For user_message steps only. A normalized summary of what the user's feedback meant. Use this only when the message is acting as feedback or judgment.
Do not use feedback_content to restate the initial request.
String or null. ISO 8601 timestamp when this step began. Null if unknown.
String or null. ISO 8601 timestamp when this step completed. Null if unknown.
Integer or null. Earlier step number this step retries, corrects, or redoes. Use only when the current step genuinely retries or corrects an earlier step.
For user_message steps:
actor is always useraction is always user_messageoperation is always nulltool is always nullexecution_mode is always nullobservation is always nullreasoning is always nullsuccess is always nulleval is always 0eval_reason is always nulldirective is always nulloutput is always nullIf a user corrects, rejects, approves, or clarifies earlier work:
user_message step neutralcaused_by to point to the earlier step or steps it refers tofeedback_content on the user message to normalize what the feedback meanteval and directive if neededretry_of on the new fix step if a retry occurredcaused_by and retry_of must be chronologically valid.
Do not point:
caused_by to the current stepcaused_by to a later stepretry_of to the current stepretry_of to a later stepUse caused_by only for direct dependencies that actually led to the step. Use retry_of only when the current step is genuinely retrying, correcting, or redoing an earlier step.
If the dependency chain is ambiguous, explain the ambiguity in causal_note rather than inventing a precise but false linkage.
If the task involves APIs, webhooks, SDKs, RPC calls, service integrations, or structured remote systems, trace API work as first-class workflow data.
For API-related steps:
observation should include the known pre-call state when relevantinput should include the exact request or request attemptoutput should include the exact response or errorstate_change should say what changed on the remote system, or that no mutation occurredWhen possible, include:
Mask credentials and secrets. Never expose raw tokens, API keys, session secrets, webhook secrets, or private auth headers.
If the task involves MCP, trace MCP work as first-class workflow data.
For MCP tool calls:
For MCP resource reads:
For MCP prompt usage:
Do not collapse MCP resource loading and MCP tool execution into one step if they are meaningfully separate actions.
If the task involves websites, SaaS apps, internal tools, desktop apps, remote desktops, or visible UI interaction, trace browser/computer-use work as first-class workflow data.
observation should capture the visible state before acting:
input should capture the literal action:
output should capture the direct visible result:
state_change should describe what changed in the UI and, when known, what changed in the underlying system.
When useful and available, include evidence artifacts such as screenshot path, DOM snapshot path, HAR/network log path, exported CSV/report path, or recording path.
If those artifacts were not available, keep the trace honest and text-based. Do not imply that screenshots, recordings, or richer evidence existed when they did not.
For CLI and file steps, preserve exact commands in input, exact stdout/stderr in output, full file content or diffs for write steps when available, and concrete state in observation.
For database or structured-data steps, also capture:
inputoutputobservation and state_changeFor spreadsheet steps:
If the workflow occurs through a surface not explicitly covered above, still trace it using the same core rules:
observationinputoutputstate_changeThis applies to messaging, email, async/event-driven systems, queues, webhooks, schedulers, background jobs, voice, audio, video, mobile apps, remote VMs, cloud consoles, hardware, robotics, lab workflows, IoT, device-control workflows, and other emerging surfaces.
If the system is asynchronous, capture both the action that triggered the work and the later observation or verification that confirmed the eventual result.
If the workflow is perception-heavy, capture both the raw signal or visible evidence available to the agent and the interpretation the agent used to decide the next action.
High-value traces often include failure, diagnosis, correction, and retry. Capture these precisely.
When a step fails:
success and eval appropriatelydirectiveretry_ofCommon cross-surface failures include:
If the agent performs a verification step, capture:
Good verification examples:
When the task produces non-chat evidence, include it in static_output when available.
Examples:
Each artifact should include path, type, role, related steps, description, state, hash when available, content or diff when available, and release sensitivity.
When richer raw evidence is available, prefer it over summaries. If those artifacts were not available, keep the trace honest and text-based.
Do not include trace JSON files, schema files, validation files, or release-preparation files as workflow artifacts unless the user-facing task was actually to create those files.
The fields above can be transformed into common trajectory and learning-signal views used in agent evaluation and post-training research:
observation = candidate state / pre-action contextaction + operation + input + tool = candidate action representationeval = weak reward / process label candidatestate_change = candidate next-state signalalternatives_considered + directive = preference or correction signal candidatefeedback_type + feedback_content = human feedback signal when genuinely presentsystem_prompt + skills + memory = agent contextagent_tools = action-space contextlearning = candidate reusable lesson / meta-cognitive signalretry_of = failed-to-corrected step linkfinal_output + static_output = terminal artifact and evidenceThese mappings are candidates, not guarantees. Downstream use should account for trace mode, validation level, confidence, available evidence, and review quality.
Do not inflate the trace's value by calling weak self-evaluations expert labels or client validation.
Do not include highly sensitive or confidential information in the trace. If the activity involved personal data, credentials, proprietary business details, or other sensitive content, mask or omit that information without disrupting the structure and usefulness of the trace.
Examples:
[CREDENTIAL][CREDENTIAL][PERSON_1], [PERSON_2][INTERNAL_URL][CREDENTIAL][CUSTOMER_ID]Use consistent placeholders. Preserve the structure, reasoning, and decision-making process while keeping sensitive details out.
If a detail is necessary to understand reasoning and safe to share, keep it. If it is sensitive filler, mask it.
At the end of the trace, assess overall confidence in the accuracy and completeness of the trace, not how good the original work was.
Use:
100: every step fully traced, high accuracy, nothing material missing75: well traced, minor gaps or approximations50: partially traced, some uncertainty in recall or execution25: significant gaps, low confidence in accuracy0: failed to trace meaningfullyBe honest. Accurate confidence ratings are more valuable than high confidence ratings.
If retracing past work and uncertain about details, state the uncertainty in the relevant step and lower confidence accordingly.
Assess whether the user's actual underlying task or workflow goal was achieved.
Boolean. True if the user's original task reached its intended outcome, false otherwise.
String or null. Discuss only the outcome of the original user-facing work.
Do not discuss trace creation, schema compliance, release preparation, or whether the trace was captured well.
Valid examples:
The Swift Lambda implementation was created, but it was not compiled or tested, so success is partial.The requested legal research summary was delivered, but it should not be treated as legal advice and was not reviewed by counsel.The local AutoDock Vina pipeline was created and a docking run completed, but the result was not experimentally validated.nullIf failed, explain the failure type, which step or steps caused the failure, what went wrong, what should have happened instead, and whether the failure was recoverable.
String. The actual main deliverable for the user's work: the code, document, analysis, fix, result, recommendation, or answer produced for the user.
It is not:
If the deliverable was a file and content is available, include the file content or a safe excerpt. If it was an analysis, include the actual analysis text. If it was a code change, include the relevant code or diff.
For retraced work, final_output should reflect the original deliverable as faithfully as possible. Do not reconstruct from memory if the actual output is available.
Object or null. Structured artifacts and artifact evidence produced by the actual work.
Use static_output when the task created, modified, deleted, observed, or generated files or structured artifacts such as code, docs, configs, logs, datasets, screenshots, reports, JSON, CSV, HTML, transcripts, or similar outputs.
Preferred format:
{
"artifacts": [
{
"path": "src/example.js",
"type": "modified",
"role": "deliverable",
"related_steps": [12, 13, 14],
"description": "Updated payment API version and request handling.",
"state": "File was modified and used in the final implementation.",
"hash": "sha256:...",
"content": null,
"diff": "...",
"release_sensitivity": "open"
}
]
}
Artifact fields:
path: file location or artifact identifiertype: created, modified, deleted, observed, or generatedrole: deliverable, intermediate, test, config, log, evidence, or another appropriate rolerelated_steps: step numbers connected to the artifact, or nulldescription: what the artifact isstate: what happened to the artifact or what was observedhash: sha256 if available, otherwise nullcontent: full artifact content when available and safe, otherwise nulldiff: modification diff when available, otherwise nullrelease_sensitivity: open, redacted, private, or excludeDo not invent artifact content, diffs, hashes, screenshots, logs, or recordings.
When publishing traces as examples or datasets, include a lightweight summary object.
Concise trace title.
Plain-language description of the traced workflow, including task type, environment, major actions, failures/retries, feedback, outcome, and known limitations.
Array of tags such as:
codinglegal-researchbiosciencetool-usebrowsercliretracedliveself-annotatedfailure-recoveryagent-work-traceOne of:
open_exampleresearch_previewprivatenot_for_releaseUse the same validation levels defined earlier.
Format the trace as a single JSON object.
{
"schema_version": "forsy-trace-v0.1",
"trace_id": "forsy_trace_example_001",
"prior_trace_id": null,
"trace_mode": "live",
"validation_level": "self_traced",
"task": "Diagnose and fix a Stripe API migration causing checkout failures.",
"agent_tools": ["web.run", "Bash", "Edit", "Read"],
"started_at": "2025-03-14T14:00:00Z",
"ended_at": "2025-03-14T14:45:00Z",
"system_prompt": "You are a helpful assistant...",
"skills": ["forsy-trace", "code-review"],
"memory": "Project uses PostgreSQL with Prisma ORM.",
"agent_config": {
"model": "example-model",
"temperature": 0.7,
"max_tokens": 4096,
"framework": "example-agent-framework"
},
"learning": "When migrating API versions, check the full changelog for breaking changes before diagnosing only the first visible error.",
"termination_reason": "task_complete",
"steps": [
{
"step": 1,
"turn": 1,
"actor": "user",
"action": "user_message",
"operation": null,
"tool": null,
"execution_mode": null,
"parallel_group": null,
"observation": null,
"input": "Stripe checkout payments started failing after our API version migration. Please diagnose the issue and fix it.",
"input_source": null,
"output": null,
"state_change": null,
"reasoning": null,
"caused_by": null,
"causal_type": null,
"causal_note": null,
"alternatives_considered": null,
"success": null,
"eval": 0,
"eval_reason": null,
"directive": null,
"message_role": "direct_request",
"feedback_type": null,
"feedback_content": null,
"started_at": "2025-03-14T14:00:00Z",
"ended_at": "2025-03-14T14:00:00Z",
"retry_of": null
},
{
"step": 2,
"turn": 1,
"actor": "agent",
"action": "agent_step",
"operation": "analyze",
"tool": null,
"execution_mode": "serial",
"parallel_group": null,
"observation": "Repository contains package.json, prisma/schema.prisma, and src/api/checkout.ts. Stripe checkout requests are failing after an API version migration.",
"input": "Analyze likely causes of checkout failures after the Stripe API migration.",
"input_source": {
"actor": "user",
"source_step": 1,
"source_field": "input",
"note": "The analysis was triggered directly by the user's debugging request."
},
"output": "Initial hypothesis list: API version mismatch, missing required fields, broken tax handling, outdated helper usage.",
"state_change": "A candidate debugging plan now exists.",
"reasoning": "A structured diagnosis plan is the best first step before calling tools or editing code because it narrows the likely failure modes.",
"caused_by": [1],
"causal_type": "user_request",
"causal_note": "This step was triggered by the user's original debugging request.",
"alternatives_considered": "Could immediately inspect Stripe docs first, but a quick diagnosis plan helps guide the next tool-assisted steps.",
"success": true,
"eval": 1,
"eval_reason": "This step created a useful diagnosis plan that advanced the workflow.",
"directive": null,
"message_role": null,
"feedback_type": null,
"feedback_content": null,
"started_at": "2025-03-14T14:00:05Z",
"ended_at": "2025-03-14T14:00:08Z",
"retry_of": null
}
],
"final_output": "Root cause identified and fix delivered...",
"static_output": {
"artifacts": []
},
"summary": {
"total_steps": 2,
"total_turns": 1,
"positive_steps": 1,
"negative_steps": 0,
"neutral_steps": 1,
"directive_signals": 0,
"human_feedback": {
"corrections": 0,
"approvals": 0,
"clarifications": 0,
"new_instructions": 0
},
"agent_confidence": 75,
"goal_achieved": true,
"goal_notes": "The requested debugging analysis and fix were delivered, with limitations noted."
},
"dataset_summary": {
"title": "Stripe API Migration Debugging Trace",
"description": "Structured trace of an agent diagnosing and fixing an API migration issue.",
"tags": ["coding", "api", "debugging", "tool-use", "failure-recovery"],
"release_tier": "open_example",
"validation_level": "self_traced"
}
}
Before publishing or sharing a trace, review and fix any issue that fails a check below.
A trace is ready for release only if:
input as one coherent requestuser_message step has eval = 0user_message step has observation = null, reasoning = null, success = null, and eval_reason = nulluser_message step has feedback_content = nullinput field contains the actual payload for that step rather than a summaryoutput field contains the actual response or actual produced content, or an explicit excerpt with omissions notedobservation describes concrete pre-step state rather than intention, plan, or hindsightreasoning field explains why the step was chosencaused_by link is chronologically valid and points only to earlier stepsretry_of link points only to an earlier step that was genuinely retried, corrected, or redonefinal_output contains the actual deliverable rather than a summary of the tracestatic_output contains structured artifacts when those artifacts were availabletrace_mode, validation_level, and summary.agent_confidence honestly reflect evidence qualitydataset_summary.description truthfully reflects the evidence in the trace and does not imply richer evidence than the trace containsWhen the trace is complete:
trace.json.manifest.json when available.artifacts/ folder when available.Suggested folder layout:
trace-name/
manifest.json
trace.json
artifacts/
README.md
This skill is released for research, engineering, and dataset-construction use. When publishing traces generated with this skill, include:
Recommended top-level fields:
schema_version, trace_id, prior_trace_id, trace_mode, validation_level, task, agent_tools, started_at, ended_at, system_prompt, skills, memory, agent_config, learning, termination_reason, steps, final_output, static_output, summary, dataset_summary
Step fields:
| # | Field | Type | Required |
|---|---|---|---|
| 1 | step | int | always |
| 2 | turn | int | always |
| 3 | actor | string | always |
| 4 | action | string | always |
| 5 | operation | string/null | always |
| 6 | tool | string/null | always |
| 7 | execution_mode | string/null | always |
| 8 | parallel_group | string/null | always |
| 9 | observation | string/null | always |
| 10 | input | string/null | always |
| 11 | input_source | object/null | always |
| 12 | output | string/null | always |
| 13 | state_change | string/null | always |
| 14 | reasoning | string/null | always |
| 15 | caused_by | int[]/null | always |
| 16 | causal_type | string/null | always |
| 17 | causal_note | string/null | always |
| 18 | alternatives_considered | string/null | always |
| 19 | success | bool/null | always |
| 20 | eval | int | always |
| 21 | eval_reason | string/null | always |
| 22 | directive | string/null | always |
| 23 | message_role | string/null | always |
| 24 | feedback_type | string/null | always |
| 25 | feedback_content | string/null | always |
| 26 | started_at | string/null | always |
| 27 | ended_at | string/null | always |
| 28 | retry_of | int/null | always |
Action types:
user_messageagent_stepoutputerrorSuggested operation values:
plananalyzesearchreadwriteeditexecuteverifydownloadinstallask_useranswerselectotherSuggested causal types:
user_requestfollow_up_user_requestanswer_to_agent_questionexecution_of_plandependency_on_tool_resultretry_after_failurecorrection_responseapproval_responseverification_of_prior_stepdependency_on_multiple_prior_stepsdelegation_to_subagentdelegated_workused_subagent_resulthandoff_from_subagentparallel_workotherMessage roles:
direct_requestanswer_to_agent_questioncorrectionapprovalclarificationselectionstatus_updatenew_constraintotherFeedback types:
correctionapprovalclarificationnew_instructionotherEval scores:
+1: positive process signal0: neutral process signal-1: negative process signalIf you are unsure about the exact open Forsy trace format at any point during tracing, re-read this file before finalizing the trace.