Architecture

The code is split into small modules on purpose.

That is partly for code health, but mostly because this package gets worse fast if search, fetch, browser rendering, and research synthesis all blur together.

Main boundaries

src/extension.ts wires the package into Pi and registers the public web_explore tool
src/tools/ contains tool adapters and internal tool-shaped helpers
src/search/ holds search backend logic
src/fetch/ handles HTTP and headless fetch logic
src/extract/ handles readable-content extraction
src/orchestration/ handles the bounded research flow
src/cache/ holds small cache helpers
src/types.ts defines shared contracts

Public surface vs internals

The public model-facing web research surface is web_explore.

The lower-level capabilities still exist in code, but they are internal steps now:

search is for discovery
HTTP fetch is for plain page reads
headless fetch is for selected browser-rendered reads
orchestration decides when enough evidence exists and when source-quality concerns need another pass or a caveat

Keeping those responsibilities separate still matters. It lets the package show provenance like [web_fetch] or [web_fetch_headless] in preview/verbose output without forcing the outer model to manually chain those steps.

Why the split exists

A search result should not be treated as a page read.

A weak HTTP extraction should not be treated as reliable evidence.

A bot-check page should not become a source.

A same-host or community-only source set should not get treated like broad corroboration.

And if more evidence is needed, the model should call web_explore again with a narrower query instead of dropping into shell commands or raw HTTP calls.

Those boundaries make failures easier to understand and make it harder for the package to lie by accident.

Architecture ​

Main boundaries ​

Public surface vs internals ​

Why the split exists ​

Architecture

Main boundaries

Public surface vs internals

Why the split exists