# Snapshot and Refs Compact element references that reduce context usage for AI agents. **Related**: [commands.md](commands.md) for full function reference, [SKILL.md](../SKILL.md) for quick start. ## Contents - [How Refs Work](#how-refs-work) - [Snapshot Output Format](#snapshot-output-format) - [Using Refs](#using-refs) - [Ref Lifecycle](#ref-lifecycle) - [Best Practices](#best-practices) - [Ref Notation Details](#ref-notation-details) - [Troubleshooting](#troubleshooting) ## How Refs Work Traditional approach: ``` Full DOM/HTML -> AI parses -> CSS selector -> Action (~3000-5000 tokens) ``` agent-browser approach: ``` Compact snapshot -> @refs assigned -> Direct interaction (~200-400 tokens) ``` The snapshot extracts interactive elements and assigns short `@e` refs, reducing token usage significantly. ## Snapshot Output Format ```bash infsh app run agent-browser --function snapshot --session $SESSION --input '{}' ``` **Response `elements_text`:** ``` @e1 [a] "Home" href="/" @e2 [a] "Products" href="/products" @e3 [a] "About" href="/about" @e4 [button] "Sign In" @e5 [input type="email"] placeholder="Email" @e6 [input type="password"] placeholder="Password" @e7 [button type="submit"] "Log In" @e8 [input type="checkbox"] name="remember" ``` **Response `elements` (structured):** ```json [ { "ref": "@e1", "desc": "@e1 [a] \"Home\" href=\"/\"", "tag": "a", "text": "Home", "role": null, "name": null, "href": "/", "input_type": null }, ... ] ``` ## Using Refs Once you have refs, interact directly: ```bash # Click the "Sign In" button '{"action": "click", "ref": "@e4"}' # Fill email input '{"action": "fill", "ref": "@e5", "text": "user@example.com"}' # Fill password '{"action": "fill", "ref": "@e6", "text": "password123"}' # Submit the form '{"action": "click", "ref": "@e7"}' # Check the "remember me" checkbox '{"action": "check", "ref": "@e8"}' ``` ## Ref Lifecycle **IMPORTANT**: Refs are invalidated when the page changes! ```bash # Get initial snapshot infsh app run agent-browser --function snapshot --session $SESSION --input '{}' # @e1 [button] "Next" # Click triggers page change infsh app run agent-browser --function interact --session $SESSION --input '{ "action": "click", "ref": "@e1" }' # MUST re-snapshot to get new refs! infsh app run agent-browser --function snapshot --session $SESSION --input '{}' # @e1 [h1] "Page 2" <- Different element now! ``` ### When to Re-snapshot Always re-snapshot after: 1. **Navigation** - Clicking links, form submissions, `goto` action 2. **Dynamic content** - AJAX loads, modals opening, tabs switching 3. **Page mutations** - JavaScript modifying the DOM The `interact` function returns a fresh snapshot in its response, so you can often use that instead of a separate snapshot call. ## Best Practices ### 1. Always Use the Latest Snapshot ```bash # CORRECT: Use snapshot from previous response RESULT=$(infsh app run agent-browser --function interact --session $SESSION --input '{ "action": "click", "ref": "@e1" }') # Use elements from $RESULT.snapshot for next action # WRONG: Using stale refs # After navigation, @e1 may point to a completely different element ``` ### 2. Check Success Before Continuing ```bash RESULT=$(infsh app run agent-browser --function interact --session $SESSION --input '{ "action": "click", "ref": "@e5" }') SUCCESS=$(echo $RESULT | jq -r '.success') if [ "$SUCCESS" != "true" ]; then echo "Click failed: $(echo $RESULT | jq -r '.message')" # Re-snapshot and retry fi ``` ### 3. Use elements_text for Quick Decisions For AI agents, `elements_text` provides a compact text representation: ``` @e1 [input type="email"] placeholder="Email" @e2 [input type="password"] placeholder="Password" @e3 [button] "Submit" ``` This is often enough to decide which element to interact with without parsing the full `elements` array. ## Ref Notation Details ``` @e1 [tag type="value"] "text content" name="attr" | | | | | | | | | +- Additional attributes | | | +- Visible text | | +- Key attributes shown | +- HTML tag name +- Unique ref ID ``` ### Common Patterns ``` @e1 [button] "Submit" # Button with text @e2 [input type="email"] # Email input @e3 [input type="password"] # Password input @e4 [a] "Link Text" href="/page" # Anchor link @e5 [select] # Dropdown @e6 [textarea] placeholder="Message" # Text area @e7 [input type="file"] # File upload @e8 [input type="checkbox"] checked # Checked checkbox @e9 [input type="radio"] selected # Selected radio @e10 [button type="submit"] "Send" # Submit button ``` ### Elements Captured The snapshot captures these interactive elements: - Links (``) - Buttons (`