修复 PR2/PR4 引入的硬编码违反"各宿主只 yaml 不同"原则的问题。
问题:
listPatientPairs / loadTablesForCohort / injectCohortFilter 硬编码了
jvs-dw 专属的 `dw_group.fact_client_out` + `patient_id` + `brand`,
其它 host 接入需改代码 → 违反 PAC 核心设计(摄入流程跟宿主无关)。
修复:
manifest.schema 加 sql_source.cohort 配置段:
patient_list_from 列患者清单的源表全名(库.表)
patient_key_column 患者主键列(所有源表共用做 cohort 过滤,默认 patient_id)
tenant_key_column 租户区分列(可选;jvs-dw=brand;单 tenant host 删此行)
list_cursor_column 列患者增量 cursor 列(对应主档表时间列)
ClickHouseSourceService:
- CohortKey 类型 { key, tenant? } 替代 { patient_id, brand }(值载体,列名外置)
- listPatientPairs:SELECT DISTINCT <key>[,<tenant>] FROM <list_from>
[WHERE <cursor_col> > x] ORDER BY <key> — 全从 cohort 配置读
- loadTablesForCohort + buildCohortClause:
有 tenant_key → (key,tenant) IN ((..)) 无 → key IN (..)
- injectCohortFilter 接收已构造好的 clause,不再硬编码列名
cold-import.service:
- cohort 类型改 CohortKey;canCohort 检查 manifest.sql_source.cohort 存在
- 配了 cohortBatchSize 但没 cohort 段 → warn + 退回 single-shot
manifest.yaml(jvs-dw)加 cohort 段:
patient_list_from: dw_group.fact_client_out
patient_key_column: patient_id
tenant_key_column: brand
list_cursor_column: last_visit_time
本地端到端验证(33,400 患者 / 759k tx):
✅ 内存峰值 389MB(对比服务器 OOM 7.6GB)— cohort batching 决定性
✅ 8 个 subject_type 全覆盖(含之前服务器 0 的 emr/payment/image)
✅ 并发锁拦截 + cursor=run_start 推进
✅ 每批增量提交(checkpoint)
✅ plan compose:10,028 plans / 17,828 reasons
✅ sub_key tooth-overlap union-find(impacted_tooth@18;28;38;48 多牙合并)
✅ K01-K08 全场景召回触发
新 host 接入清单(零代码):
1. manifest.yaml 写 connection + queries + incremental.per_query + cohort 段
2. 写 assemblers/*.yaml(canonical 映射)
3. 写 transforms(如需 JSON 拆行等)
完事 — sync / cohort / 增量 / 锁 全部复用
| Name |
Last commit
|
Last update |
|---|---|---|
| .claude | Loading commit data... | |
| apps | Loading commit data... | |
| deploy | Loading commit data... | |
| docs | Loading commit data... | |
| packages | Loading commit data... | |
| scripts | Loading commit data... | |
| .gitignore | Loading commit data... | |
| .npmrc | Loading commit data... | |
| .prettierrc | Loading commit data... | |
| README.md | Loading commit data... | |
| docker-compose.prod.yml | Loading commit data... | |
| docker-compose.yml | Loading commit data... | |
| eslint.config.mjs | Loading commit data... | |
| liu.cjs | Loading commit data... | |
| package.json | Loading commit data... | |
| pnpm-lock.yaml | Loading commit data... | |
| pnpm-workspace.yaml | Loading commit data... | |
| tsconfig.base.json | Loading commit data... | |
| turbo.json | Loading commit data... |