Pointer Scan — Final Implementation Plan
Why This Plan Exists
三次尝试全部失败:
- Server BFS (v1-v3): collect 10M pairs → 160MB → OOM;BFS level 2 爆 1.7M hits
- Kernel DFS (3-28): 226 秒扫完,找到 10 条链,但中间经过堆临时对象,GC 后断裂
- Client UI: 代码已 revert
失败根因:UE4 指针链 depth > 10,堆引用极度密集(每个 UObject 有 Outer/Class/Package 指针),反向扫描本质上被噪声淹没。
但 ptrscan 仍然有价值:对没有 SDK dump 的游戏(非 UE4),或者作为初始探索工具找到 static base 附近的稳定链。CE 的 ptrscan 就是这么用的。
Architecture: Kernel-BFS (per-level scan)
不收集 pairs。每层直接 kmap 扫全内存,用 sorted target array 做 binary search 匹配。
Client Server Kernel (shadow_ce.ko)
| | |
|--- CMD 0xD6 ----------->| |
| (pid,target, |--- ioctl PTRSCAN ------->|
| depth,offset, | |
| flags) | | Level 0: target
| | | Level 1: scan ALL rw pages
| | | for each page:
| | | kmap → scan 8B aligned
| | | bsearch sorted targets
| | | hit → add to next level
| | | hit in static → record chain
| | | Level 2..N: repeat
| | |
| |<-- chains + modules -----|
|<-- wire response -------| |
| | |
display results | |Why Kernel-BFS, Not Collect-Then-BFS
| Collect + Server BFS | Kernel BFS (per-level) | |
|---|---|---|
| Memory | 160MB pairs + sort buffer | ~4MB (cur + nxt level arrays) |
| OOM risk | High (4.6GB game → 48M ptrs) | Zero |
| Speed | 1-3s (sort dominant) | ~2s/level × depth |
| Complexity | Kernel + server both complex | Kernel does everything, server trivial |
Why Not DFS
DFS (current 3-28 code) 每层也扫全内存,但 226 秒是因为它在 depth-first 路径上对每个 candidate 都做全扫描。BFS 每层只扫一次,所有当前层 targets 一次性匹配。10 层 BFS = 10 次全扫描 = ~20s。10 层 DFS 可能 = 数百次全扫描。
Kernel Module: shadow_ce.c + shadow_ce.h
New ioctl: SHADOW_CE_PTRSCAN (CE_IOC_MAGIC, 20)
/* shadow_ce.h */
#define PTRSCAN_MAX_DEPTH 16
#define PTRSCAN_MAX_RESULTS 4096
#define PTRSCAN_PER_LEVEL 65536 /* max nodes per BFS level */
/* Flags */
#define PTRSCAN_F_ALIGNED (1 << 0) /* 8-byte aligned only (default) */
#define PTRSCAN_F_NEGATIVE (1 << 1) /* allow negative offsets */
struct ce_ptrscan_chain {
uint32_t depth;
/* chain[0] = static base addr, chain[depth-1] = near target */
uint64_t addrs[PTRSCAN_MAX_DEPTH];
int32_t offsets[PTRSCAN_MAX_DEPTH];
};
struct ce_ptrscan_req {
int32_t pid;
uint64_t target;
uint64_t mask; /* 0 = default 0x0000FFFFFFFFFFFF */
uint32_t depth_max; /* clamped to PTRSCAN_MAX_DEPTH */
uint32_t offset_max; /* max abs(offset) in bytes */
uint32_t flags;
uint32_t max_results;
uint64_t result_buf; /* userspace ptr to ce_ptrscan_chain[] */
uint32_t result_count; /* out */
uint32_t scan_time_ms; /* out: total time in ms */
};do_ptrscan() Algorithm
Input: target address, depth_max, offset_max
Output: chains from target to static bases
1. Collect VMA info (one mmap_read_lock):
- rw_ranges[]: all VM_WRITE VMAs → scan these
- static_regions[]: file-backed rw- VMAs (exclude .vdex/.odex/.art/.oat/kgsl/mali)
- Also include [anon:.bss] immediately after file-backed rw- as static
(GEngine lives in .bss!)
2. cur_level = [{addr: target, chain_idx: 0}]
Sort cur_level by addr
3. For level = 1 to depth_max:
nxt_level = []
For each rw page in rw_ranges:
kmap_local_page
For each 8-byte aligned value V in page:
V = untagged_addr(V)
if V < 0x1000: skip (NULL-ish)
// Binary search cur_level for any entry where |entry.addr - V| <= offset_max
idx = lower_bound(cur_level, V - offset_max)
while cur_level[idx].addr <= V + offset_max:
offset = V - cur_level[idx].addr (signed!)
source_addr = page_va + byte_offset
if source_addr in static_regions:
Record chain (prepend source to chain of cur_level[idx])
n_chains++
if nxt_level.count < PER_LEVEL_CAP:
nxt_level.append({addr: source_addr, parent: idx, offset: offset})
idx++
kunmap_local_page
cond_resched every 256 pages
signal_pending check every page
cur_level = nxt_level
Sort cur_level by addr (for next level's binary search)
4. Build chain output:
Walk parent pointers to reconstruct full path for each recorded chain
copy_to_user
Memory: cur_level 65K×24B = 1.5MB, nxt_level same, chains 4K×200B = 800KB
Total: ~4MBKey Implementation Details
-
[anon:.bss] detection:
vma->vm_file == NULL && vma->vm_flags & VM_WRITEAND immediately follows a file-backed rw- VMA of the same module. Without this, GEngine (in .bss) is never found as static base. -
Binary search matching: cur_level sorted by addr.
lower_bound(V - offset_max)then iterate until> V + offset_max. This is O(log N) per value instead of O(N). -
Chain reconstruction: Each nxt_level entry stores
parent_idx(index into cur_level) andoffset. To build the chain for a static hit, walk back through levels via parent pointers. Need to save all levels’ arrays (not just cur/nxt). Total extra memory: depth × 65K × 24B = 16 × 1.5MB = 24MB max — acceptable. -
untagged_addr: Use the kernel macro, NOT hardcoded
& 0x7FFFFFFFFF. SM8750 may use 48-bit VA. -
signal_pending: Check every page (not every 256). Allows user cancel.
-
Per-level cap: 65K nodes per level. If nxt_level fills up, stop adding but continue scanning (still find static hits for already-added nodes). This prevents OOM while maximizing chain discovery.
Server: server.c
CMD_PTRSCAN (0xD6) — Passthrough + Module Snapshot
Server’s role is minimal:
- Receive request from client
- Collect module snapshot (already have
get_module_list()) - Call
ioctl(SHADOW_CE_PTRSCAN, &req) - Package chains + module list → send to client
#define CMD_PTRSCAN 0xD6
/* Wire format */
// Request: [1B cmd][4B pid][8B target][4B depth][4B offset][4B flags][4B max_chains]
// Response: [4B status][4B scan_time_ms]
// [4B mod_count][per mod: 2B name_len, name, 8B base, 8B size, 8B file_offset]
// [4B chain_count][per chain: 4B depth, depth × (8B addr + 4B offset)]Module base_offset Resolution
Chain 结果中每个 addrs[0] 是 static base 的 runtime 地址。Server 用 module snapshot 把它转成 (module_name, file_offset):
For static_addr in chain.addrs[0]:
Find module where any segment contains static_addr
file_offset = segment.file_offset + (static_addr - segment.vm_start)
Send: module_index + file_offset (uint64)Client 用 _parse_address("module_name+0xFILE_OFFSET") 解析回 runtime 地址。已验证 _parse_address 支持 file offset → runtime addr 转换(刚修了 segment matching 逻辑)。
Client
ce_client.py: ptrscan(pid, target, depth, offset, flags, max_chains)
- 复用现有 TCP 连接(scan 连接),加锁
- 或新建专用连接(如果 scan 正在用)
- 超时 60 秒(10 层 × ~2s/层 + margin)
- 返回:
(modules: list, chains: list[PtrChain])
@dataclass
class PtrChain:
depth: int
module_name: str # static base module
base_file_offset: int # file offset within module
offsets: list[int] # [offset_at_base, ..., offset_near_target]ptrscan_dialog.py: Settings Dialog (Phase 1 简化版)
┌─ Pointer Scan Settings ───────────────────┐
│ │
│ Address: [0x701169BF00 ] │
│ │
│ Max depth: [10 ] │
│ Max offset: [4096 ] │
│ Max results: [4096 ] │
│ │
│ [x] Aligned only (8-byte) │
│ [ ] Allow negative offsets │
│ │
│ [Scan] [Cancel] │
└───────────────────────────────────────────┘入口: address list 右键 → “Pointer scan for this address”
ptrscan_results.py: Results Window
┌─ Pointer Scan Results ─────────────────────────────────────────┐
│ Found 47 chains in 18.3s │
│ │
│ Base Address | Offsets | Points To │
│ libUE4.so+0xB034CD8 | +780 +78 +38 +0 +30 ... | 17 │
│ libUE4.so+0xB034CD8 | +780 +78 +38 +0 +30 ... | 17 │
│ ... │
│ │
│ Double-click to add to address list │
│ │
│ [Rescan] [Close] │
└────────────────────────────────────────────────────────────────┘- 虚拟列表 (QAbstractTableModel),不加载全部行
- 双击 → 创建 AddrEntry(is_pointer=True, base_addr_str=“libUE4.so+0xB034CD8”, pointer_offsets=[…])
- Rescan: 验证已有链是否仍然指向同一地址(杀掉断裂的链)
Worker Thread
class PtrScanWorker(QThread):
finished = pyqtSignal(object) # (modules, chains) or error string
def run(self):
try:
result = self.client.ptrscan(...)
self.finished.emit(result)
except Exception as e:
self.finished.emit(str(e))用 pyqtSignal,不用 QTimer.singleShot(从子线程调用会静默失败)。
Validated Offset Resolution
客户端 _parse_address("libUE4.so+0xB034CD8") 已验证工作流:
_modules_cache获取所有 segments:[(base, size, name, path, file_offset), ...]- 找
foff <= 0xB034CD8 < foff + size的 segment - 返回
base + (0xB034CD8 - foff)= runtime addr - 如果没找到(BSS 不在 file-backed segment)→ fallback:
first_segment_base + offset
刚才实测: 0x70CD4EA000 + 0xB034CD8 = 0x70D851ECD8 → 读到 GEngine → 链走通。
但有个问题: enum_modules 返回的 file-backed segments 不包含 [anon:.bss]。BSS 地址落不进任何 segment 的 [foff, foff+size) 范围。Fallback 到 base + offset 恰好能工作(因为 PIE .so 的 file_offset == vaddr),但这是巧合不是设计。
Fix: Server 在发送 module list 时,计算每个模块的 “total span”(包含 BSS),或者 client 直接用 fallback(对 PIE 总是正确的)。
Implementation Order
Phase 1: Kernel BFS (shadow_ce.c + shadow_ce.h)
- 新增 ioctl struct + do_ptrscan()
- 用 chain_read.c 手动测试(已有,ioctl 直接调用)
- 目标: 对 FPS2 的 Ammo 地址扫描,depth=10, offset=8192, 在 30s 内返回包含已知 GEngine 链的结果
Phase 2: Server passthrough (server.c)
- CMD 0xD6 handler
- Module snapshot + chain 打包
- base_offset file_offset 解算
Phase 3: Client (ce_client.py + dialog + results)
- ptrscan() 协议方法
- Settings dialog
- Results window
- 双击添加 + _resolve_pointer 验证(已修好 += )
- Rescan
Phase 4: Polish
- Progress 反馈(内核每层完成 → server → client 更新)
- Cancel(client close socket → server → kill -SIGINT → signal_pending)
- 保存/加载结果
Avoid List (血泪教训)
| 不要 | 为什么 |
|---|---|
| 预收集所有 pairs | 4.6GB 游戏 = 160MB+ pairs → OOM |
| pairs_tmp (radix sort) | 没实现过,白分配 320MB |
| DFS 全内存扫描 | 每个 candidate 扫一次,226s |
| inner loop 不 break | 百万次无用迭代 |
| QTimer.singleShot 从子线程 | 静默失败 |
| 硬编码 39-bit untag | SM8750 可能 48-bit VA |
| 只查 file-backed rw- 段 | 漏掉 [anon:.bss](GEngine 在这里) |
| offset reverse | BFS 输出已经是 CE order,不需要反转 |
| addr -= offset | CE 标准是 += |
| r-x 段当 static base | ARM64 bl 指令编码 = 假指针 |
Agent Teams Strategy
Phase 1 (kernel) 必须先完成并验证,其余可并行。用 agent teams:
Lead: 协调 + 验证链路
├── Teammate A (kernel): shadow_ce.h + shadow_ce.c do_ptrscan()
├── Teammate B (server): server.c CMD 0xD6 handler (依赖 Phase 1 ioctl 定义)
└── Teammate C (client): ce_client.py + ptrscan_dialog.py + ptrscan_results.pyTeammate B/C 可以在 A 完成 ioctl struct 定义后立即开始(不需要等 A 写完整个算法)。