Architecture
The code is split into small modules on purpose.
That is partly for code health, but mostly because this package gets worse fast if search, fetch, browser rendering, and research synthesis all blur together.
Main boundaries
src/extension.tswires the package into Pi and registers the publicweb_exploretoolsrc/tools/contains tool adapters and internal tool-shaped helperssrc/search/holds search backend logicsrc/fetch/handles HTTP and headless fetch logicsrc/extract/handles readable-content extractionsrc/orchestration/handles the bounded research flowsrc/cache/holds small cache helperssrc/types.tsdefines shared contracts
Public surface vs internals
The public model-facing web research surface is web_explore.
The lower-level capabilities still exist in code, but they are internal steps now:
- search is for discovery
- HTTP fetch is for plain page reads
- headless fetch is for selected browser-rendered reads
- orchestration decides when enough evidence exists and when source-quality concerns need another pass or a caveat
Keeping those responsibilities separate still matters. It lets the package show provenance like [web_fetch] or [web_fetch_headless] in preview/verbose output without forcing the outer model to manually chain those steps.
Why the split exists
A search result should not be treated as a page read.
A weak HTTP extraction should not be treated as reliable evidence.
A bot-check page should not become a source.
A same-host or community-only source set should not get treated like broad corroboration.
And if more evidence is needed, the model should call web_explore again with a narrower query instead of dropping into shell commands or raw HTTP calls.
Those boundaries make failures easier to understand and make it harder for the package to lie by accident.