实时维护。这是整个业务怎么运转的唯一一张图:**多入口 → 收敛成一个公司身份 → 统一采集流程 → 成本分级筛选漏斗 → 逐层深挖 → master.md → 建站。** 架构细节见docs/v3/SPEC-FUNNEL-ORCHESTRATION.md·docs/v3/SPEC-GATHER-MODULE.md。 图例:✅ 已建 · 🔄 在做/规划中 · ⚠️ 缺口(入口未接 / 未统一)。_更新于 2026-05-31。_
【入口 · 多个】────────────► 全部收敛成「一个公司身份」(名字 / 电话 / 地址 / 唯一标识)
· Docker 地图爬 (pl:scrape-docker / gosom) ✅
· Google Places API (pl:places-search-intake) ✅
· 牌照数据库 — 42 万行 SQLite ✅
· Google 搜索 → 拿结果 (tinyfish + ddg) ✅
· 你发的一张图片 ⚠️ 入口未接
· 你发的一个链接 ⚠️ 入口未接
│
▼
【统一流程 · 所有入口共用这一条管子】
1. 搜索(多引擎 · 5 条线) ✅
2. AI 判断相关性 + 是不是同一家(防同名冒牌 · 红线) ✅(身份判官 R143)
3. 找到官方网站 ✅
4. 爬官网 + Google 地图 + Places API + 社媒(多来源) ✅(社媒抓取走 OpenCLI)
5. 交叉验证 → 整理 → master.md 🔄(验证层 + 汇总 · R146)
│
▼
【筛选漏斗 · 最便宜最快的先筛 · 尽早排除非客户】
阶段 0(免费/秒级):无联系方式 / 已关店 / 测试名 → 排除 ✅ exclusion-filter
阶段 1(免费/查库):牌照吊销或过期 → 排除 ✅ #9(仅在身份已确认的牌照上触发)
阶段 2(便宜): 太大 / 连锁 / 政府 / 同行 → 排除 ✅ exclusion-filter
阶段 3(中等): 问题有多大 = 我们能帮多少(审计打分) ✅ 审计分级
阶段 4(中等): 在不在经营?付不付得起?(活跃度信号) ⚠️ 部分
│ 每一层踢掉不合格的;活下来的才往下走(越往下越贵)
▼
【深度采集 · 只对走到这里的线索做 · 最贵的几步】
· 全站爬取 + 真实照片 + 评价 + 社媒背景 ✅ 零件都有
· → 丰富的 master.md(建站素材) 🔄 R146 Phase E
│
▼
【给他做网站】
三条铁律:
已扎实 vs 待办: 统一流程 + 漏斗的「零件」基本都有了(身份判官、社媒抓取、牌照门、外部内容挖掘都已落地)。待办:(a) 图片/链接入口还没接到收敛口;(b) 漏斗的成本分级总控(入口收敛 → 分阶段 gate → 逐层深挖)还不是一个显式控制器(pl:run-funnel 是雏形);(c) 采集骨干(页面规模准确、免费优先爬虫、验证层、master.md 汇总)是 SPEC-GATHER-MODULE.md 计划。
Companion toREADME.md§ The Funnel andSPEC-GATHER-MODULE.md. Grounded code inventory so we EXTEND, not rebuild (CLAUDE.md §7).(inferred)= surfaced by sweep, path not independently confirmed — verify before relying. Overall: ~72% of the funnel is wired & current; the identity-canonical write lane and a single end-to-end controller are the big gaps.
| Channel | CLI | Status |
|---|---|---|
| Docker/gosom maps scrape | pl:scrape-docker | ✅ (D43 fix: batch-start now chains it) |
| Google Places API | pl:places-search-intake | ✅ |
| Single business (phone/name/Maps URL) | pl:single-enrich (auto-chains audit) | ✅ |
| Licence DB (422k SQLite) | pl:license-lookup / pl:license-build-index / pl:license-csv-sync | ✅ |
| From a PHOTO | pl:ingest-image + core/leads/image-lead-discovery-v2.js | ⚠️ entity created, but VLM auto-OCR is TODO(G-6.1) — fields still manual |
| From a LINK (arbitrary URL) | — | ❌ NOT wired (planned in data/sop1/intake-channels.json) |
Registry of channels: data/sop1/intake-channels.json.
core/leads/discovery-store.js (upsert/merge/score/phase) · schema validation core/leads/entity-schema.js · score core/leads/discovery-score.js · routing core/leads/grade-router.js.data/leads/entities/<key>.json { latest{name,phone,email,website,address,city,niche,…}, status, phase, enrichment, license, deploy }.core/enrichment/identity-match.js (R143-fixed) → tiered core/enrichment/identity/resolve-identity.js (tier0→2→1, write_allowed:false).core/leads/enrichment.js (5 routes: official/fb/ig/li/reviews + reverse-phone) · core/extractors/tinyfish.js (T0) → core/scrape/ddg.js (fallback).core/llm/match-judge.js (judgeEnrichmentMatches + judgePageIdentity).core/enrichment/mine-background.js (R145, identity-gated, quarantine) + OpenCLI core/enrichment/fetch/opencli-fetch.js (R137, cleared) — pl:mine-background.pl:places-enrich; reviews/GBP core/leads/reviews-adapter.js, core/handoff/gbp-*. Gate core/leads/enrichment-gate.js. Batch pl:run-enrichment-batch (+ identity observe wired R138).core/leads/exclusion-filter.js (3 layers: data-quality / business-type / timing) + core/leads/niche-config.json.core/scoring/cheap-audit-v2.js (T0) → detailed core/scoring/detailed-audit.js (T1) → reviews+vision (T2).core/scoring/lead-grading.js (investment_level/product_tier/pricing) · gate core/scoring/qualification-scorecard.js (7 hard gates + 5D score≥60).core/leads/licence-kill-observe.js (R144, OBSERVE only, identity-gated). Archive core/leads/terminal-archive.js.core/audit/multi-page-crawl.js (sitemap-aware · Firecrawl PAID → Playwright fallback · captures url/title/meta/rawHtml/text/images/links/headings).contact-extraction.js, logo-extractor.js, activity-audit.js, form-audit.js; (inferred) tech-stack/domain-history/ai-geo/image-optimization (verify).core/audit/redesign-brief-builder.js DEEP → writes clients/<slug>/v2/core-extract.json {real_facts, brand_signals, ai_extensions, qualification}.core/reports/master-md-builder.js (frontmatter + 5 CN sections) · CLI scripts/leads/build-master-md.js · refresh core/leads/master-md-refresh.js.clients/<slug>/v2/master.md (+ themed HTML via huashu-md-html).pl:run-funnel (R124): discovery→enrich→audit+grade+master · resumable · dry-default · excludes identity-canonical lane.leads:run-pipeline: detailed-audit→vision→reviews→internal-report · does NOT auto-invoke build-master-md.pl:pipeline-batch-start/step, pl:task-dispatcher/listener (SOP-0 queue).discovery-store.js not unified under one transactional upsert (locking exists, not unified).contact-extraction) vs external search (enrichment) both write entity.latest with no precedence — site should win (authoritative).license-lookup vs identity-match) — consolidate one validator.redesign-brief-builder vs master-md-builder both synthesize narrative — make brief an explicit INPUT to master.md, not a parallel path.How the funnel is DRIVEN (Discord + Hermes) and made MODULAR (skills the agents call). Companion toREADME.md§ The Funnel,FUNNEL-INVENTORY.md,SPEC-FUNNEL-ORCHESTRATION.md. Sources: SOP_0_TASK_SYSTEM.md, DISCORD-CHANNELS-PRD.md. Legend ✅ live · ⚠️ code-exists-not-wired · ❌ missing/aspirational.
| Channel | Env | Status | Purpose |
|---|---|---|---|
| #website-tasks | WEBSITE_TASKS_FORUM_CHANNEL_ID | ✅ | command/task entry → intent-router → CLI |
| #website-leads | WEBSITE_LEADS_DISCORD_CHANNEL_ID | ✅ | per-lead threads (no-demo) · grade/phase tags |
| #website-projects | WEBSITE_PROJECTS_DISCORD_CHANNEL_ID | ⚠️ | demo-ready leads + sales stages (not actively written) |
| #website-templates | WEBSITE_TEMPLATES_DISCORD_CHANNEL_ID | ✅ | template family threads |
| #lead-discovery-runs | LEAD_DISCOVERY_RUNS_DISCORD_CHANNEL_ID | ⚠️ | batch run progress (code exists, not emitting) |
| #paid-websites | PAID_WEBSITES_DISCORD_CHANNEL_ID | ❌ | M5+ paid build/revision stages |
Flows (who posts): batch progress core/funnel/pipeline-batch-thread.js; per-lead thread core/funnel/lead-thread-sync.js; per-stage (9 stages) core/funnel/audit-stage-messages.js; cheap-audit verdict core/leads/cheap-audit-queue.js; archive core/leads/terminal-archive.js; paid intake/revision core/funnel/paid-intake-ops.js; build handoff / review / live-publish core/contracts/discord-messages.js. Message contract SoT: core/contracts/discord-messages.js.
core/funnel/hermes-cron.js → local python ~/Developer/Hermes Agent): reads #website-tasks, runs per-lead crons (grade-A 4h / grade-B 12h), calls pl:context (read) + posts decision drafts for operator approval, advances phase. Skill exposed: profitslocal-lead-ops via registerLeadCron(...,{skill}). Status: ⚠️ local-only, aspirational (no VPS).core/funnel/submission-router.js + paid-intake-ops.js): Stripe webhook → order/entitlement/revision-quota → case memory → agent task → #paid-websites. Status: ✅ MVP verified (Opa). Not a persistent agent — a synchronous dispatcher.core/tasks/intent-router.js (codex_cli→claude_cli→ollama→regex, 8 kinds) → resolves to a pl:*/leads:* CLI. Agent tasks: data/agent-tasks/<client>/*.json executed by operator. No runtime skill-runner.skills/*/SKILL.md) — modular intent, not yet runtime-executableDiscovery/screen: profitslocal-lead-discovery, profitslocal-lead-filter, profitslocal-entity-enrichment, image-lead-discovery, site-audit. Collect/build-prep: profitslocal-collect, profitslocal-build-research-pack, profitslocal-data-checkpoint, profitslocal-assemble-handoff, profitslocal-audit-handoff. Audit/QA: profitslocal-quality-audit, website-copy-audit, website-ui-audit, pl-audit-rubric. Build/spec/voice: pl-local-trade-page-spec, pl-au-trade-voice, website-redesign-preservation, template-lab. Orchestration: lead-ops (the Hermes-callable one).
profitslocal-lead-ops is Hermes-callable. Skills are narrative prompt artifacts, not executables — no skill-loader / skill-registry / skill:run CLI.Make funnel steps modular, agent-callable skills so Hermes (and Claude/Commerce agents) can invoke them uniformly. Today: CLIs exist for every step, but the "skill" layer is docs + a hard-coded intent map. The target = a real skill surface (discoverable + runnable) that the agents call, instead of editing intent-router.js per new step.
skill:run <name> --args. Intent-router hard-maps kind→CLI (must edit code to add a skill).ops:skill-contract-audit gate (built R136) already checks SKILL.md currency — extend it to validate the runnable manifest.状态: PLAN (codex R147). Owns the WHOLE funnel: entry convergence → cost-gated ordering → identity gating → progressive deepening → ONE master.md terminal artifact → drop accounting. The deep-capture/gather internals are a separate track (SPEC-GATHER-MODULE.md, invoked only AFTER cheap gates pass). Picture:README.md§ The Funnel · live https://pl-business-map.pages.dev
Inventory (FUNNEL-INVENTORY.md) shows ~72% of parts are wired, but the system is not yet a coherent funnel: no single controller converges entries → runs cheapest-first exclusions → deepens only survivors → emits one master.md. This spec makes the system BEHAVE like the README funnel before we polish deep-capture quality.
Cheapest + fastest exclusions first; expensive work (full crawl, photos, reviews, vision) runs ONLY on leads that survive. Never spend money on disqualified leads.
entry (any channel) → ONE identity → [Stage0 free] → [Stage1 free-db] → [Stage2 cheap] →
[Stage3 mid: problem-size] → [Stage4 mid: running/ability-to-pay] → [DEEP capture] → ONE master.md → build
↑ each gate drops the unqualified + records WHY (drop accounting)
pl:run-funnel) that: converges entry → entity; runs the EXISTING cost-gated stages IN ORDER (exclusion-filter L1 → licence-kill(observe) → exclusion L2 → cheap-audit → qualification/grading); does NOT run detailed crawl/reviews/deep until cheap gates pass; auto-chains surviving leads into build-master-md; emits drop accounting (entered / excluded@stageX / survived-to-deep / master-built). Reuses all existing stage components — no rebuild.
redesign-brief an explicit INPUT to ONE master-md-builder; read external_facts (R145) behind an observe/flag path; reviews/GBP as named source blocks; ensure the terminal artifact is reliable.resolveIdentity verdict → proposed canonical patch → observe log → (gated) promotion. NO automatic canonical writes until 300-500 gold clearance. Lane exists structurally so the funnel has the right shape + accumulates real-run evidence (avoids painful retrofit).cost_tier (free/cheap/mid/expensive); a lead only advances if the prior gate passes; expensive stages are unreachable for dropped leads (structural, not convention).rollupScreeningObservability shape, R144).pl:run-funnel is the seed to extend.