Commit fdbe3424 by luoqi

fix(ingest): consult SQL 还原朴素(CSV 双源等价)— concat 移到 transforms.derive

- 违反了 manifest 纪律(SQL 只做朴素导出,形态改造归 transforms):consult_external_id 的 concat
  从 SQL 移到 transforms.derive(I 段),SQL 回归 SELECT 列 + WHERE 过滤。
- organization_id IN(过滤无关诊所)+ cohort patient 过滤 保留(合法 WHERE,导出本就按诊所过滤)。
- 审计其他表:均朴素(SELECT 列+WHERE);image_finding_rows 是文档记录的已知例外(影像AI pivot,CH 限制)。
- 重摄验证:5993 facts/0 failed,subject_id 与 concat 版一致。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
parent 956a42d0
...@@ -230,13 +230,12 @@ sql_source: ...@@ -230,13 +230,12 @@ sql_source:
# ── 咨询主体(fact_consult_out)→ consultation_record(意向源,5 试点)── # ── 咨询主体(fact_consult_out)→ consultation_record(意向源,5 试点)──
# patient_register_id = patient_id(5 诊所内 94.4% 命中;CH 允许 WHERE 引用别名,同 returnvisit)。 # patient_register_id = patient_id(5 诊所内 94.4% 命中;CH 允许 WHERE 引用别名,同 returnvisit)。
# 无 updated_date/无 id → consult_external_id 用 (patient,appo,date) concat(全唯一); # SQL 朴素(只 SELECT 列 + WHERE 过滤,CSV 双源等价);consult_external_id 由 transforms.derive 拼(I 段)。
# 不入 per_query(无可靠 cursor)→ 每轮全量按 org+patient 过滤再拉(幂等 upsert,同 returnvisit)。 # 无 updated_date/无 id → external_id=(patient,appo,date);不入 per_query(无 cursor)→ 每轮全量按
# potential_cure = 患者咨询意向(主观,非诊断;不进召回)。 # org+patient 过滤再拉(幂等 upsert,同 returnvisit)。potential_cure=患者意向(主观,非诊断,不进召回)。
fact_consult_out: | fact_consult_out: |
SELECT patient_register_id AS patient_id, organization_id, brand, SELECT patient_register_id AS patient_id, organization_id, brand,
appointment_date, first_visit, task_director, potential_cure, doctor_user_id, appo_id, appointment_date, first_visit, task_director, potential_cure, doctor_user_id, appo_id
concat(toString(patient_register_id), '|', toString(appo_id), '|', toString(appointment_date)) AS consult_external_id
FROM dw_group.fact_consult_out FROM dw_group.fact_consult_out
WHERE organization_id IN (SELECT DISTINCT organization_id FROM dw_group.fact_emr_treatment_out) WHERE organization_id IN (SELECT DISTINCT organization_id FROM dw_group.fact_emr_treatment_out)
AND (patient_register_id, brand) IN ( AND (patient_register_id, brand) IN (
...@@ -867,6 +866,17 @@ transforms: ...@@ -867,6 +866,17 @@ transforms:
op: concat op: concat
parts: ['${treatment_items}', ' · ', '${treatment_items_two}'] parts: ['${treatment_items}', ' · ', '${treatment_items_two}']
# ── I. fact_consult_out:合成 consult_external_id(无 id 列)──
# 形态改造(concat)归 transforms,SQL 保持朴素导出(CSV 双源等价)。
# (patient,appo,date) 全唯一(实测 368183 行 = uniq_composite)。
- kind: derive
input: fact_consult_out
output: fact_consult_out
fields:
consult_external_id:
op: concat
parts: ['${patient_id}', '|', '${appo_id}', '|', '${appointment_date}']
# ───────────────────────────────────────────────────────────── # ─────────────────────────────────────────────────────────────
# PAC 写的 assembler 配置(每个 canonical resource 一份) # PAC 写的 assembler 配置(每个 canonical resource 一份)
# ───────────────────────────────────────────────────────────── # ─────────────────────────────────────────────────────────────
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment