Commit 4400e2a7 by luoqi

fix(ingest): 本次禁用 image_finding(prod CH 不兼容 多JOIN+ARRAYJOIN)

prod ClickHouse 拒绝「Multiple JOIN + ARRAY JOIN 混用」+ org 子查询 GROUP BY 带偏 cohort 注入。
本地 CH 24.3 容忍故未暴露。暂注掉 image_finding_rows query + assembler 注册,
放行召回质量/联系人/回访 3 块;影像 AI 信号源作 follow-up(经 prod CH 重做 SQL)。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
parent 4a7750d0
......@@ -176,36 +176,11 @@ sql_source:
WHERE last_visit_time IS NOT NULL
)
# ── 影像 AI 分析(fact_emr_image_analysis_out)→ 诊断信号源(image_finding) ──
# 结构化 AI 源,例外地在源 SQL 一次性 pivot+炸牙位(下游零 transform):
# ① join file_num→client 取 patient_id+brand(image 表无 patient_id);
# ② LEFT JOIN 取患者 EMR 的 organization_id 作诊所(image 无诊所,clinic 是 transaction 立柱必填;覆盖 ~98%);
# ③ ARRAY JOIN 把 10 个病种列(每列一个牙位数组字符串 "['38']")pivot 成 (code, 数组);
# ④ replaceRegexpAll 去括号引号 + splitByChar + arrayJoin 炸成每颗牙一行;
# ⑤ code_source='image_ai' 标来源(diagnosis.parser 显式优先);externalId 带 |imgai| → 独立 subject。
# 病种→K 码映射(host 数据形态,留在 manifest yaml):cavity→K02 / 阻生·埋伏→K01 / 根尖周→K04 /
# 残根·残冠·楔状→K03 / 囊肿→K09 / 缺失→K08 / 乳牙滞留→K00。去重靠召回层 (subKey,tooth) 聚类。
image_finding_rows: |
SELECT patient_id, brand, organization_id, emr_id, rq, code, code_source, tooth,
concat(emr_id, '|imgai|', code, '|', tooth) AS diag_external_id
FROM (
SELECT c.patient_id AS patient_id, c.brand AS brand, po.org AS organization_id,
ia.emr_id AS emr_id, ia.rq AS rq, cm.1 AS code, 'image_ai' AS code_source,
arrayJoin(splitByChar(',', replaceRegexpAll(cm.2, '[\[\] '']', ''))) AS tooth
FROM dw_group.fact_emr_image_analysis_out ia
INNER JOIN dw_group.fact_client_out c ON c.file_num = ia.file_num AND c.brand = ia.brand
LEFT JOIN (
SELECT patient_id, brand, any(organization_id) AS org
FROM dw_group.fact_emr_treatment_out WHERE notEmpty(organization_id)
GROUP BY patient_id, brand
) po ON po.patient_id = c.patient_id AND po.brand = c.brand
ARRAY JOIN [('K02', ia.cavity), ('K01', ia.impacted_tooth), ('K01', ia.embedded_tooth),
('K04', ia.root_periodontitis), ('K03', ia.root_remnant), ('K03', ia.crown_remnant),
('K03', ia.wedge_shaped_defect), ('K09', ia.cyst), ('K08', ia.tooth_loss),
('K00', ia.retained_primary_tooth)] AS cm
WHERE c.last_visit_time IS NOT NULL AND notEmpty(po.org) AND cm.2 != '[]' AND cm.2 != ''
)
WHERE tooth != ''
# ── 影像 AI 分析 → 诊断信号源(image_finding)【本次禁用,follow-up】 ──
# ⚠️ prod CH 不允许「多 JOIN + ARRAY JOIN 混用」(Multiple JOIN does not support mix with ARRAY JOINs),
# 且 org 子查询的 GROUP BY 会带偏 cohort 注入。需重做(pivot/炸牙位挪 transform 层 或 换 clinic 取法)。
# 本地 CH 24.3 容忍该写法故未暴露;prod 严格。暂禁,assembler 注册也一并注掉。
# (原 SQL 见 git 历史 commit 4a7750d)
# ── 诊所回访任务(fact_returnvisit_out)→ patient_return_visit upsert(展示用,5 试点)──
# customer_id AS patient_id(让 cohort 过滤生效);WHERE org ∈ 5 试点(= EMR 表的 org)→ 只摄 5 家。
......@@ -840,7 +815,7 @@ assemblers:
- { file: assemblers/encounter.yaml }
- { file: assemblers/appointment.yaml }
- { file: assemblers/diagnosis.yaml }
- { file: assemblers/image_finding.yaml } # 影像 AI 分析 → diagnosis_record(code_source=image_ai)
# - { file: assemblers/image_finding.yaml } # 【本次禁用,prod CH join+arrayjoin 不兼容,follow-up 重做】
- { file: assemblers/treatment_planned.yaml }
- { file: assemblers/treatment_review.yaml }
- { file: assemblers/treatment_actual.yaml }
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment