ProfitsLocal / WebJuice — 项目总览

实时维护的项目结构总览 · 从 README + docs/v3 预渲染。Tabs: 概览 / 漏斗清单 / 控制层 / 路线图。

🗺️业务全景逻辑图(实时维护 · 先看这个)

实时维护。这是整个业务怎么运转的唯一一张图:**多入口 → 收敛成一个公司身份 → 统一采集流程 → 成本分级筛选漏斗 → 逐层深挖 → master.md → 建站。** 架构细节见 docs/v3/SPEC-FUNNEL-ORCHESTRATION.md · docs/v3/SPEC-GATHER-MODULE.md。 图例:✅ 已建 · 🔄 在做/规划中 · ⚠️ 缺口(入口未接 / 未统一)。_更新于 2026-05-31。_
【入口 · 多个】────────────► 全部收敛成「一个公司身份」(名字 / 电话 / 地址 / 唯一标识)
  · Docker 地图爬 (pl:scrape-docker / gosom)               ✅
  · Google Places API (pl:places-search-intake)            ✅
  · 牌照数据库 — 42 万行 SQLite                             ✅
  · Google 搜索 → 拿结果 (tinyfish + ddg)                  ✅
  · 你发的一张图片                                          ⚠️ 入口未接
  · 你发的一个链接                                          ⚠️ 入口未接
                              │
                              ▼
【统一流程 · 所有入口共用这一条管子】
  1. 搜索(多引擎 · 5 条线)                                              ✅
  2. AI 判断相关性 + 是不是同一家(防同名冒牌 · 红线)                     ✅(身份判官 R143)
  3. 找到官方网站                                                         ✅
  4. 爬官网 + Google 地图 + Places API + 社媒(多来源)                    ✅(社媒抓取走 OpenCLI)
  5. 交叉验证 → 整理 → master.md                                          🔄(验证层 + 汇总 · R146)
                              │
                              ▼
【筛选漏斗 · 最便宜最快的先筛 · 尽早排除非客户】
  阶段 0(免费/秒级):无联系方式 / 已关店 / 测试名 → 排除                 ✅ exclusion-filter
  阶段 1(免费/查库):牌照吊销或过期 → 排除                              ✅ #9(仅在身份已确认的牌照上触发)
  阶段 2(便宜):    太大 / 连锁 / 政府 / 同行 → 排除                     ✅ exclusion-filter
  阶段 3(中等):    问题有多大 = 我们能帮多少(审计打分)               ✅ 审计分级
  阶段 4(中等):    在不在经营?付不付得起?(活跃度信号)              ⚠️ 部分
                              │   每一层踢掉不合格的;活下来的才往下走(越往下越贵)
                              ▼
【深度采集 · 只对走到这里的线索做 · 最贵的几步】
  · 全站爬取 + 真实照片 + 评价 + 社媒背景                                 ✅ 零件都有
  · → 丰富的 master.md(建站素材)                                       🔄 R146 Phase E
                              │
                              ▼
                          【给他做网站】

三条铁律:

  1. 入口千变万化,下游只有一条管子。 每条线索先收敛成唯一的公司身份,之后全部走同一条「搜索 → 判相关 → 找官网 → 多源采集 → master.md」流程。
  2. 漏斗 = 成本分级。 最便宜的排除先做(免费查库/启发式规则);最贵的活(全站爬取 + 拿照片)只对走到漏斗底部的线索做。绝不在不合格的线索上花钱。
  3. master.md 是漏斗底部的成品 —— 给「确定要做的客户」的丰富素材文档(有官网走 redesign 版;无官网走背景调研版)。

已扎实 vs 待办: 统一流程 + 漏斗的「零件」基本都有了(身份判官、社媒抓取、牌照门、外部内容挖掘都已落地)。待办:(a) 图片/链接入口还没接到收敛口;(b) 漏斗的成本分级总控(入口收敛 → 分阶段 gate → 逐层深挖)还不是一个显式控制器(pl:run-funnel 是雏形);(c) 采集骨干(页面规模准确、免费优先爬虫、验证层、master.md 汇总)是 SPEC-GATHER-MODULE.md 计划。


FUNNEL INVENTORY · existing work, by stage (2026-05-31)

Companion to README.md § The Funnel and SPEC-GATHER-MODULE.md. Grounded code inventory so we EXTEND, not rebuild (CLAUDE.md §7). (inferred) = surfaced by sweep, path not independently confirmed — verify before relying. Overall: ~72% of the funnel is wired & current; the identity-canonical write lane and a single end-to-end controller are the big gaps.

Stage 1 · ENTRY POINTS (all converge to one entity)

ChannelCLIStatus
Docker/gosom maps scrapepl:scrape-docker✅ (D43 fix: batch-start now chains it)
Google Places APIpl:places-search-intake
Single business (phone/name/Maps URL)pl:single-enrich (auto-chains audit)
Licence DB (422k SQLite)pl:license-lookup / pl:license-build-index / pl:license-csv-sync
From a PHOTOpl:ingest-image + core/leads/image-lead-discovery-v2.js⚠️ entity created, but VLM auto-OCR is TODO(G-6.1) — fields still manual
From a LINK (arbitrary URL)❌ NOT wired (planned in data/sop1/intake-channels.json)

Registry of channels: data/sop1/intake-channels.json.

Stage 2 · IDENTITY CONVERGENCE

Stage 3 · UNIFIED ENRICHMENT FLOW

Stage 4 · SCREENING / FILTER (cheap-first)

Stage 5 · DEEP CAPTURE (existing-site)

Stage 6 · CONSOLIDATION → master.md

Stage 7 · ORCHESTRATION

OVERLAP / DUPLICATION RISKS

  1. Entity-write paths in discovery-store.js not unified under one transactional upsert (locking exists, not unified).
  2. Enrichment task-spawn decided in 3 places (enrichment / enrichment-gate / cheap-audit-queue) — centralize + idempotent.
  3. Contact info from site HTML (contact-extraction) vs external search (enrichment) both write entity.latest with no precedence — site should win (authoritative).
  4. ABN validity in two places (license-lookup vs identity-match) — consolidate one validator.
  5. redesign-brief-builder vs master-md-builder both synthesize narrative — make brief an explicit INPUT to master.md, not a parallel path.

GAPS (priority)

Agents · Skills · Discord — the control + modular layer (2026-05-31)

How the funnel is DRIVEN (Discord + Hermes) and made MODULAR (skills the agents call). Companion to README.md § The Funnel, FUNNEL-INVENTORY.md, SPEC-FUNNEL-ORCHESTRATION.md. Sources: SOP_0_TASK_SYSTEM.md, DISCORD-CHANNELS-PRD.md. Legend ✅ live · ⚠️ code-exists-not-wired · ❌ missing/aspirational.

Discord — the operator surface (POST-mostly; commands via #website-tasks)

ChannelEnvStatusPurpose
#website-tasksWEBSITE_TASKS_FORUM_CHANNEL_IDcommand/task entry → intent-router → CLI
#website-leadsWEBSITE_LEADS_DISCORD_CHANNEL_IDper-lead threads (no-demo) · grade/phase tags
#website-projectsWEBSITE_PROJECTS_DISCORD_CHANNEL_ID⚠️demo-ready leads + sales stages (not actively written)
#website-templatesWEBSITE_TEMPLATES_DISCORD_CHANNEL_IDtemplate family threads
#lead-discovery-runsLEAD_DISCOVERY_RUNS_DISCORD_CHANNEL_ID⚠️batch run progress (code exists, not emitting)
#paid-websitesPAID_WEBSITES_DISCORD_CHANNEL_IDM5+ paid build/revision stages

Flows (who posts): batch progress core/funnel/pipeline-batch-thread.js; per-lead thread core/funnel/lead-thread-sync.js; per-stage (9 stages) core/funnel/audit-stage-messages.js; cheap-audit verdict core/leads/cheap-audit-queue.js; archive core/leads/terminal-archive.js; paid intake/revision core/funnel/paid-intake-ops.js; build handoff / review / live-publish core/contracts/discord-messages.js. Message contract SoT: core/contracts/discord-messages.js.

Agents

Skills (19 · skills/*/SKILL.md) — modular intent, not yet runtime-executable

Discovery/screen: profitslocal-lead-discovery, profitslocal-lead-filter, profitslocal-entity-enrichment, image-lead-discovery, site-audit. Collect/build-prep: profitslocal-collect, profitslocal-build-research-pack, profitslocal-data-checkpoint, profitslocal-assemble-handoff, profitslocal-audit-handoff. Audit/QA: profitslocal-quality-audit, website-copy-audit, website-ui-audit, pl-audit-rubric. Build/spec/voice: pl-local-trade-page-spec, pl-au-trade-voice, website-redesign-preservation, template-lab. Orchestration: lead-ops (the Hermes-callable one).

The modularization target (Matthew's intent)

Make funnel steps modular, agent-callable skills so Hermes (and Claude/Commerce agents) can invoke them uniformly. Today: CLIs exist for every step, but the "skill" layer is docs + a hard-coded intent map. The target = a real skill surface (discoverable + runnable) that the agents call, instead of editing intent-router.js per new step.

INTEGRATION GAPS (Discord/agents/skills ↔ funnel stages 1-7)

  1. No runtime skill-runner — skills are prompts, not executables; can't skill:run <name> --args. Intent-router hard-maps kind→CLI (must edit code to add a skill).
  2. Discord↔skill loop missing — flow is Discord→intent-router→CLI→Discord; no skill in the middle; event-driven skill use needs a manual Claude Code session.
  3. Two channels code-exists-not-wired — #lead-discovery-runs (batch threads) + #website-projects (demo-ready) not actually emitting → operator blind to batch progress + demo backlog.
  4. Stage 8 (build) / 9 (publish) not skill-wrapped — CLIs only; agents can't invoke directly.
  5. Hermes local-only — no always-on deployment; can't run when operator offline.
  6. Identity/screening/gather modules (this session) not yet skill- or Discord-surfaced — resolveIdentity, mine-background, licence-kill, run-funnel exist as CLIs/modules but aren't in the intent-router kinds or skill registry.

Where this plugs into the framework

SPEC · Funnel Orchestration (stages 1-7) · codex R147 · 2026-05-31

状态: PLAN (codex R147). Owns the WHOLE funnel: entry convergence → cost-gated ordering → identity gating → progressive deepening → ONE master.md terminal artifact → drop accounting. The deep-capture/gather internals are a separate track (SPEC-GATHER-MODULE.md, invoked only AFTER cheap gates pass). Picture: README.md § The Funnel · live https://pl-business-map.pages.dev

Why this exists (codex R147)

Inventory (FUNNEL-INVENTORY.md) shows ~72% of parts are wired, but the system is not yet a coherent funnel: no single controller converges entries → runs cheapest-first exclusions → deepens only survivors → emits one master.md. This spec makes the system BEHAVE like the README funnel before we polish deep-capture quality.

Core principle — cost staging (Matthew)

Cheapest + fastest exclusions first; expensive work (full crawl, photos, reviews, vision) runs ONLY on leads that survive. Never spend money on disqualified leads.

entry (any channel) → ONE identity → [Stage0 free] → [Stage1 free-db] → [Stage2 cheap] →
  [Stage3 mid: problem-size] → [Stage4 mid: running/ability-to-pay] → [DEEP capture] → ONE master.md → build
                         ↑ each gate drops the unqualified + records WHY (drop accounting)

Build order (codex R147)

  1. Phase 1 · Orchestrator skeleton + drop accounting ← FIRST. One controller (extend pl:run-funnel) that:

converges entry → entity; runs the EXISTING cost-gated stages IN ORDER (exclusion-filter L1 → licence-kill(observe) → exclusion L2 → cheap-audit → qualification/grading); does NOT run detailed crawl/reviews/deep until cheap gates pass; auto-chains surviving leads into build-master-md; emits drop accounting (entered / excluded@stageX / survived-to-deep / master-built). Reuses all existing stage components — no rebuild.

  1. Phase 2 · Stage-6 consolidation slice: make redesign-brief an explicit INPUT to ONE master-md-builder; read external_facts (R145) behind an observe/flag path; reviews/GBP as named source blocks; ensure the terminal artifact is reliable.
  2. Phase 3 · Identity observe/proposed-canonical lane: resolveIdentity verdict → proposed canonical patch → observe log → (gated) promotion. NO automatic canonical writes until 300-500 gold clearance. Lane exists structurally so the funnel has the right shape + accumulates real-run evidence (avoids painful retrofit).
  3. Phase 4 · Gather backbone = the R146 sequence B(page-scale)→A(free-crawl)→D(verify)→C(unify)→E(richer master.md), as depth improvement INSIDE deep-capture.

ADRs (R147)

Scope guard