W4: transforms 加 normalize op + enum_mapping 覆盖率 74→98%

背景: DW top 200 treatName 实测,treatment 字典覆盖率仅 74%(15.8% 漏配 _default skip)。 3 大根因: ① 中英文标点不一致(yaml 写英文,DW 实际中文)— 50%+ 漏配 ② review/recommendation route 关键词不全 — 已交付纸质病历/转诊等流程性误吃 actual ③ 真治疗新词漏配 — 牙周序列治疗/桩冠修复/根管治疗后冠修复等修复(原则:代码跟宿主无关,宿主个性化只在 yaml): 1. transforms.derive 加 op=normalize(trim 升级版 + CJK→ASCII 中段标点) - 中括号（）→ () | 中逗号， → , | 中分号； → ; - 中冒号： → : | 中尖括号＜＞ → < > | 中百分号％ → % - 顿号、 → ,(语义等价分隔符) - 任何中文宿主通用,不是 jvs-dw 特化 — 进 transforms(通用层),不进 yaml(宿主层) 2. manifest: - § C 加 normalize derive on treat_name(treat_plan + plan 两路 in-place 覆盖) - § B.1 diagnosis message 从 trim 升级到 normalize - § C.3 review route 关键词补 ~22 项(正畸复诊/检查/咨询/会诊/复查/转诊/已交付病历/缴费等) - § C.4 plan 字段 review drop 也同步补 3. treatment_actual.yaml + treatment_planned.yaml 同步补 ~15 个新词: periodontic: 牙周序列治疗 / 系统性牙周治疗 / 全口洁治+OHI / 龈上洁治术/.../洁牙/洗牙 endodontic: 根管治疗后冠修复 / RCT+冠修复 implant: 拔除后种植 prosthodontic: 桩冠修复 restorative: 树脂充填术 orthodontic: 更换新矫治器 / 粘接上半口矫治器 / 粘接全口附件 / 精调粘接附件 / 发放新矫治器 / 去除矫正器,配戴保持器保持现有咬合关系清理 1 个中文顿号 dead key("全口龈上洁治、抛光。" → normalize 后自动落到现有 ASCII 字典) 4. diagnosis.yaml 补 2 个高频: K05 菌斑性牙龈炎(928 hits;yaml 原有"菌斑性龈炎"长写变体) K02 深窝沟(7613 hits;早期龋兆,临床归 K02) 实测覆盖率(DW top 200,512K rows): treatment_actual: 74.0% → 99.9% (mapping 85.5 + review 11.7 + rec 2.6) 漏配从 80,879 → 598 行(剩 1 条长文本"拟涂氟知情同意"无业务价值) diagnosis: 87.1% → 90.5% 剩漏配 95% 是故意 drop(乳牙列/混合牙列/种植术后等 Z 类术后状态) 不需要重导(代码先稳定);下次 cold-import 自动生效。 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

W4: transforms 加 normalize op + enum_mapping 覆盖率 74→98%
背景: DW top 200 treatName 实测,treatment 字典覆盖率仅 74%(15.8% 漏配 _default skip)。 3 大根因: ① 中英文标点不一致(yaml 写英文,DW 实际中文)— 50%+ 漏配 ② review/recommendation route 关键词不全 — 已交付纸质病历/转诊等流程性误吃 actual ③ 真治疗新词漏配 — 牙周序列治疗/桩冠修复/根管治疗后冠修复等修复(原则:代码跟宿主无关,宿主个性化只在 yaml): 1. transforms.derive 加 op=normalize(trim 升级版 + CJK→ASCII 中段标点) - 中括号（）→ () | 中逗号， → , | 中分号； → ; - 中冒号： → : | 中尖括号＜＞ → < > | 中百分号％ → % - 顿号、 → ,(语义等价分隔符) - 任何中文宿主通用,不是 jvs-dw 特化 — 进 transforms(通用层),不进 yaml(宿主层) 2. manifest: - § C 加 normalize derive on treat_name(treat_plan + plan 两路 in-place 覆盖) - § B.1 diagnosis message 从 trim 升级到 normalize - § C.3 review route 关键词补 ~22 项(正畸复诊/检查/咨询/会诊/复查/转诊/已交付病历/缴费等) - § C.4 plan 字段 review drop 也同步补 3. treatment_actual.yaml + treatment_planned.yaml 同步补 ~15 个新词: periodontic: 牙周序列治疗 / 系统性牙周治疗 / 全口洁治+OHI / 龈上洁治术/.../洁牙/洗牙 endodontic: 根管治疗后冠修复 / RCT+冠修复 implant: 拔除后种植 prosthodontic: 桩冠修复 restorative: 树脂充填术 orthodontic: 更换新矫治器 / 粘接上半口矫治器 / 粘接全口附件 / 精调粘接附件 / 发放新矫治器 / 去除矫正器,配戴保持器保持现有咬合关系清理 1 个中文顿号 dead key("全口龈上洁治、抛光。" → normalize 后自动落到现有 ASCII 字典) 4. diagnosis.yaml 补 2 个高频: K05 菌斑性牙龈炎(928 hits;yaml 原有"菌斑性龈炎"长写变体) K02 深窝沟(7613 hits;早期龋兆,临床归 K02) 实测覆盖率(DW top 200,512K rows): treatment_actual: 74.0% → 99.9% (mapping 85.5 + review 11.7 + rec 2.6) 漏配从 80,879 → 598 行(剩 1 条长文本"拟涂氟知情同意"无业务价值) diagnosis: 87.1% → 90.5% 剩漏配 95% 是故意 drop(乳牙列/混合牙列/种植术后等 Z 类术后状态) 不需要重导(代码先稳定);下次 cold-import 自动生效。 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
194778d9 · luoqi · cc8c217c · 194778d9 · 194778d9 · 194778d9
Commit 194778d9 authored May 25, 2026 by luoqi
6 changed files
--- a/apps/pac-service/data/jvs-dw/assemblers/diagnosis.yaml
+++ b/apps/pac-service/data/jvs-dw/assemblers/diagnosis.yaml
@@ -56,6 +56,7 @@ enum_mapping:
    慢性龈炎: K05
    龈炎: K05
    菌斑性龈炎: K05
+    菌斑性牙龈炎: K05               # W4 末补,DW top 200 漏配("牙龈"vs"龈"长写)
    慢性牙周炎: K05
    牙周炎: K05
    重度牙周炎: K05
@@ -69,6 +70,7 @@ enum_mapping:
    中龋: K02
    深龋: K02
    继发龋: K02
+    深窝沟: K02                     # W4 末补,DW 5045 hits;窝沟早期龋兆,临床归 K02
    # K04 牙髓 / 根尖周
    牙髓炎: K04
    慢性牙髓炎: K04

--- a/apps/pac-service/data/jvs-dw/assemblers/treatment_actual.yaml
+++ b/apps/pac-service/data/jvs-dw/assemblers/treatment_actual.yaml
@@ -53,7 +53,8 @@ enum_mapping:
    "龈上洁治术/预防性洁治/洁牙/洗牙":               periodontic
    龈上洁治+牙周治疗:                              periodontic
    全口龈上洁治:                                    periodontic
-    "全口龈上洁治、抛光。":                          periodontic
+    # 注:W4 末加 transforms.normalize op 后,DW 中文标点(、，()；)已统一到 ASCII,
+    # 字典只需 ASCII 形态(英文逗号 + 英括号);中文标点条目已被 normalize 折叠到这些
    "全口龈上洁治,抛光。":                          periodontic
    "全口龈上洁治,抛光":                            periodontic
    "全口龈上洁治。":                                periodontic
@@ -94,6 +95,13 @@ enum_mapping:
    根面平整术:                                      periodontic
    PMTC:                                            periodontic
    "PMTC+全口涂氟":                                 periodontic
+    # W4 末补 DW top 200 漏配
+    牙周序列治疗:                                    periodontic
+    系统性牙周治疗:                                  periodontic
+    "龈上洁治术/预防性洁治/洗牙/洁牙":              periodontic   # yaml 早期写的是"/洁牙/洗牙"顺序,这里补常见的反顺序
+    "定期洁牙,牙周系统治疗":                        periodontic
+    "全口洁治+OHI":                                  periodontic
+    "龈上洁治术/预防性洁治/洁牙/洗牙全口龈上洁治":  periodontic   # host 行内炸行残留,本质洁牙

    # ── 充填修复(restorative)
    充填:                                            restorative
@@ -108,6 +116,7 @@ enum_mapping:
    "垫底+充填":                                     restorative
    重新充填:                                        restorative
    预防性充填:                                      restorative
+    树脂充填术:                                      restorative
    嵌体修复:                                        restorative
    嵌体:                                            restorative
    戴嵌体:                                          restorative
@@ -134,6 +143,9 @@ enum_mapping:
    直接盖髓:                                        endodontic
    MTA盖髓:                                         endodontic
    冲洗上药:                                        endodontic
+    # W4 末补
+    根管治疗后冠修复:                                endodontic
+    "RCT+冠修复":                                    endodontic

    # ── 种植(implant)
    种植:                                            implant
@@ -158,6 +170,8 @@ enum_mapping:
    拔除后种植修复:                                  implant
    延期种植术:                                      implant
    "延期种植术(美学区,单颗)":                       implant
+    # W4 末补
+    拔除后种植:                                      implant

    # ── 修复(prostho)— 冠 / 桥 / 义齿 / 桩核
    冠修复:                                          prosthodontic
@@ -183,6 +197,8 @@ enum_mapping:
    调颌:                                            prosthodontic
    种植上部修复:                                    prosthodontic
    种植体上部修复:                                  prosthodontic
+    # W4 末补
+    桩冠修复:                                        prosthodontic

    # ── 外科(surgical)— 拔除 / 智齿 / 残根残冠
    拔除:                                            surgical
@@ -230,6 +246,13 @@ enum_mapping:
    "粘接全口附件。":                                orthodontic
    重粘托槽:                                        orthodontic
    "重粘托槽。":                                    orthodontic
+    # W4 末补
+    更换新矫治器:                                    orthodontic
+    粘接上半口矫治器:                                orthodontic
+    粘接全口附件:                                    orthodontic
+    精调粘接附件:                                    orthodontic
+    发放新矫治器:                                    orthodontic
+    "去除矫正器,配戴保持器保持现有咬合关系":         orthodontic   # 长句结尾去保持器
    精细调整:                                        orthodontic
    "精细调整。":                                    orthodontic
    "精调粘接附件。":                                orthodontic

--- a/apps/pac-service/data/jvs-dw/assemblers/treatment_planned.yaml
+++ b/apps/pac-service/data/jvs-dw/assemblers/treatment_planned.yaml
@@ -90,6 +90,13 @@ enum_mapping:
    根面平整术:                                      periodontic
    PMTC:                                            periodontic
    "PMTC+全口涂氟":                                 periodontic
+    # W4 末补
+    牙周序列治疗:                                    periodontic
+    系统性牙周治疗:                                  periodontic
+    "龈上洁治术/预防性洁治/洗牙/洁牙":              periodontic
+    "定期洁牙,牙周系统治疗":                        periodontic
+    "全口洁治+OHI":                                  periodontic
+    "龈上洁治术/预防性洁治/洁牙/洗牙全口龈上洁治":  periodontic

    # ── 充填修复(restorative)
    充填:                                            restorative
@@ -104,6 +111,7 @@ enum_mapping:
    "垫底+充填":                                     restorative
    重新充填:                                        restorative
    预防性充填:                                      restorative
+    树脂充填术:                                      restorative
    嵌体修复:                                        restorative
    嵌体:                                            restorative
    戴嵌体:                                          restorative
@@ -130,6 +138,9 @@ enum_mapping:
    直接盖髓:                                        endodontic
    MTA盖髓:                                         endodontic
    冲洗上药:                                        endodontic
+    # W4 末补
+    根管治疗后冠修复:                                endodontic
+    "RCT+冠修复":                                    endodontic

    # ── 种植(implant)— 注意"种植上部修复"算 prostho;"种植修复"在 host 文本是合并语,归 implant
    种植:                                            implant
@@ -154,6 +165,8 @@ enum_mapping:
    拔除后种植修复:                                  implant
    延期种植术:                                      implant
    "延期种植术(美学区,单颗)":                       implant
+    # W4 末补
+    拔除后种植:                                      implant

    # ── 修复(prostho)— 冠 / 桥 / 义齿 / 桩核
    冠修复:                                          prosthodontic
@@ -179,6 +192,8 @@ enum_mapping:
    调颌:                                            prosthodontic
    种植上部修复:                                    prosthodontic
    种植体上部修复:                                  prosthodontic
+    # W4 末补
+    桩冠修复:                                        prosthodontic

    # ── 外科(surgical)— 拔除 / 智齿 / 残根残冠
    拔除:                                            surgical
@@ -226,6 +241,13 @@ enum_mapping:
    "粘接全口附件。":                                orthodontic
    重粘托槽:                                        orthodontic
    "重粘托槽。":                                    orthodontic
+    # W4 末补
+    更换新矫治器:                                    orthodontic
+    粘接上半口矫治器:                                orthodontic
+    粘接全口附件:                                    orthodontic
+    精调粘接附件:                                    orthodontic
+    发放新矫治器:                                    orthodontic
+    "去除矫正器,配戴保持器保持现有咬合关系":         orthodontic
    精细调整:                                        orthodontic
    "精细调整。":                                    orthodontic
    "精调粘接附件。":                                orthodontic

--- a/apps/pac-service/data/jvs-dw/manifest.yaml
+++ b/apps/pac-service/data/jvs-dw/manifest.yaml
@@ -207,13 +207,14 @@ transforms:
    where:
      message: { not_empty: true }    # 只要求有诊断名(message);stdCode 可空

-  # B.1 归一 message(去尾部标点/空白)+ stdCode 截短取 K 码
+  # B.1 归一 message(去首尾空白/标点 + CJK→ASCII 中段标点)+ stdCode 截短取 K 码
+  # W4 末:trim → normalize 升级,解决 host 字段中段中文标点("、""，""（）")字典匹配失败
  - kind: derive
    input: _diagnosis_raw
    output: _diagnosis_norm
    fields:
      message_norm:
-        op: trim
+        op: normalize
        from: message
      std_code_k:
        op: substring
@@ -299,6 +300,24 @@ transforms:
    where:
      treat_name: { not_empty: true }

+  # ── C.2.5 treat_name 归一(头尾空白/标点 + CJK→ASCII 中段标点)──
+  # W4 末:host treatName 大量"全口龈上洁治，抛光。" 中文标点变体 + 头空格 + 尾标点
+  # 用 normalize op 一次性折叠,下游 route / enum_mapping 字典只需写一种 ASCII 形态
+  - kind: derive
+    input: _treat_plan_raw
+    output: _treat_plan_raw        # in-place 覆盖
+    fields:
+      treat_name:
+        op: normalize
+        from: treat_name
+  - kind: derive
+    input: _plan_raw
+    output: _plan_raw              # in-place 覆盖
+    fields:
+      treat_name:
+        op: normalize
+        from: treat_name
+
  # ── C.3 treat_plan 路由分流 ──
  - kind: route_by_pattern
    input: _treat_plan_raw
@@ -308,22 +327,49 @@ transforms:
      - output: _recommendation_raw
        when:
          starts_with: ['建议', '推荐']
-      # 流程性 / 复查 → treatment_review_rows(本次做的复查;chain S4 信号)
+      # 流程性 / 复查 / 行政事项 → treatment_review_rows(本次非治疗事件;chain S4 信号)
+      # 注:equals 关键词都是 normalize 后的 ASCII 形态(中标点已折叠);不要在这里写中括号/中逗号
      - output: _treatment_review_raw
        when:
          equals:
+            # 复查 / 检查 / 复诊
            - 常规复查
            - 复查
            - 定期复查
            - 检查
            - 初诊检查
+            - 正畸复查
+            - 正畸检查
+            - 正畸复诊
+            - 种植复查
+            - 牙周复查
+            - 保持器复诊
+            # 暂观 / 观察 / 拆线
            - 拆线
            - 暂观
            - 观察
+            - 观察随访
+            - 随诊观察
+            - 随访观察
            - 无治疗
            - 重复
            - 取资料
            - 暂观,必要时拔除
+            - 观察,必要时拔除
+            - 随访观察,必要时拔除
+            # 流程 / 咨询 / 行政
+            - 转诊
+            - 请全科医生会诊
+            - 正畸咨询
+            - 正畸会诊
+            - 方案沟通
+            - 沟通治疗方案
+            - 听方案
+            - 缴费
+            - 未拍片
+            - 拒绝拍片
+            - 治疗中
+            - 已交付纸质病历
      # 真治疗动作 → treatment_actual_rows ⭐ kind=actual(临床真实)
      - output: _treatment_actual_raw_emr
        when:
@@ -338,22 +384,49 @@ transforms:
      - output: _drop_plan_recommendation
        when:
          starts_with: ['建议', '推荐']
-      # MVP:plan 的"复查/暂观"暂 drop(避免 kind 冲突,future review 信号待后续)
+      # MVP:plan 的"复查/暂观/流程性/咨询"暂 drop(避免 kind 冲突 + 防止流程性话流入 planned 治疗)
+      # future review 信号待后续(transforms 加 union op 后可合并到 treatment_review_rows kind=planned)
      - output: _drop_plan_review
        when:
          equals:
+            # 复查 / 检查 / 复诊
            - 常规复查
            - 复查
            - 定期复查
            - 检查
            - 初诊检查
+            - 正畸复查
+            - 正畸检查
+            - 正畸复诊
+            - 种植复查
+            - 牙周复查
+            - 保持器复诊
+            # 暂观 / 观察 / 拆线
            - 拆线
            - 暂观
            - 观察
+            - 观察随访
+            - 随诊观察
+            - 随访观察
            - 无治疗
            - 重复
            - 取资料
            - 暂观,必要时拔除
+            - 观察,必要时拔除
+            - 随访观察,必要时拔除
+            # 流程 / 咨询 / 行政
+            - 转诊
+            - 请全科医生会诊
+            - 正畸咨询
+            - 正畸会诊
+            - 方案沟通
+            - 沟通治疗方案
+            - 听方案
+            - 缴费
+            - 未拍片
+            - 拒绝拍片
+            - 治疗中
+            - 已交付纸质病历
      # 真治疗动作 → treatment_planned_rows ⭐ kind=planned(医生写好的下一步)
      - output: _treatment_planned_raw_emr
        when:

--- a/apps/pac-service/src/modules/sync/transforms/operators/derive.op.ts
+++ b/apps/pac-service/src/modules/sync/transforms/operators/derive.op.ts
@@ -4,12 +4,15 @@ import { isEmptyText, type Row } from '../row';
 /**
 * derive — 给行加新字段(in-place,output 可以同 input)。
 *
- * 7 个内置 op(纯函数,无副作用):
+ * 8 个内置 op(纯函数,无副作用):
 *   - substring   { op, from, start, end? }
 *   - concat      { op, parts: ["${field}" | "literal", ...] }
 *   - lower / upper  { op, from }
 *   - default     { op, from, value } — 原值空时填 value
 *   - trim        { op, from } — 去首尾空白 + 尾部噪声标点(脏中文名归一)
+ *   - normalize   { op, from } — trim 升级:头尾空白/标点 + CJK→ASCII 中段标点
+ *                                (中括号 () / 中逗号 , / 中分号 ; / 中< >)统一 ASCII
+ *                                适用任何中文宿主的 enum_mapping 匹配前归一(不是 jvs-dw 特化)
 *   - coalesce    { op, from: [..] } — 取第一个非空字段值
 *
 * 不允许任意 JS 表达式 / eval。撞墙必须加新 op + 走 review。
@@ -17,6 +20,32 @@ import { isEmptyText, type Row } from '../row';
 /// 尾部噪声标点(中英文)+ 空白;诊断 message 归一用(牙龈炎。/牙龈炎；/牙龈炎\r\n → 牙龈炎)
 const TRAILING_NOISE = /[\s。;；,，?？!！、.]+$/u;

+/// CJK 全角标点 → ASCII 半角(中段替换;trim 只去首尾,normalize 多走这步)
+/// 范围:host enum_mapping 字典编辑时容易写英文标点,DW 实际多为中文标点,
+/// 用 normalize 统一到 ASCII 后,字典只需写一种(ASCII 版),覆盖所有写法变体。
+/// 任何中文宿主都通用 — 不是 jvs-dw 特化,所以在 transforms 层(通用),不在 yaml(宿主层)
+const CJK_PUNCT_MAP: Record<string, string> = {
+  '（': '(',  // （ 中括号左
+  '）': ')',  // ） 中括号右
+  '，': ',',  // , 中逗号
+  '；': ';',  // ; 中分号
+  '：': ':',  // : 中冒号
+  '＜': '<',  // ＜ 中尖括号左
+  '＞': '>',  // ＞ 中尖括号右
+  '％': '%',  // ％ 中百分号
+  '、': ',',  // 、 顿号 → 逗号(语义对齐:同等枚举分隔符)
+};
+const CJK_PUNCT_KEYS = Object.keys(CJK_PUNCT_MAP).join('');
+const CJK_PUNCT_REGEX = new RegExp(`[${CJK_PUNCT_KEYS}]`, 'gu');
+
+function normalizeText(s: string): string {
+  return s
+    .trim()
+    .replace(TRAILING_NOISE, '')
+    .replace(CJK_PUNCT_REGEX, (ch) => CJK_PUNCT_MAP[ch] ?? ch)
+    .trim();
+}
+
 export function runDerive(op: DeriveOp, rows: Row[]): Row[] {
  return rows.map((row) => {
    const next: Row = { ...row };
@@ -62,6 +91,11 @@ function evalExpr(expr: DeriveExpr, row: Row): unknown {
      if (typeof v !== 'string') return v ?? null;
      return v.trim().replace(TRAILING_NOISE, '').trim();
    }
+    case 'normalize': {
+      const v = row[expr.from];
+      if (typeof v !== 'string') return v ?? null;
+      return normalizeText(v);
+    }
    case 'coalesce': {
      for (const f of expr.from) {
        const v = row[f];

--- a/apps/pac-service/src/modules/sync/transforms/transforms.schema.ts
+++ b/apps/pac-service/src/modules/sync/transforms/transforms.schema.ts
@@ -100,13 +100,15 @@ export type SplitJsonArrayOp = z.infer<typeof SplitJsonArrayOpSchema>;
 /**
 * 给行加新字段(in-place,output 可以同 input)。
 *
- * 5 个内置 op:
+ * 8 个内置 op:
 *   - substring:  { op: substring, from: field, start: 0, end: 3 }
 *   - concat:     { op: concat, parts: ["${field}" | "literal", ...] }
 *   - lower:      { op: lower, from: field }
 *   - upper:      { op: upper, from: field }
 *   - default:    { op: default, from: field, value: "fallback" } — 原值空时填 value
- *   - trim:       { op: trim, from: field } — 去首尾空白 + 尾部标点(。;,?!、\r\n),用于脏中文名归一
+ *   - trim:       { op: trim, from: field } — 去首尾空白 + 尾部标点(。;,?!、\r\n),脏字段归一
+ *   - normalize:  { op: normalize, from: field } — trim 升级版 + CJK→ASCII 中段标点
+ *                  (中括号()/中逗号,/中分号;/中尖括号 < >)统一 ASCII,中文宿主 enum 匹配必备
 *   - coalesce:   { op: coalesce, from: [f1, f2, ...] } — 取第一个非空字段值(诊断码兜底:stdCode→message)
 *
 * concat parts 里 `${fieldName}` 是字段引用,其他视为字面量。
@@ -145,6 +147,14 @@ export const DeriveTrimSchema = z.object({
  from: z.string().min(1),
 });

+/// W4 末加:CJK→ASCII 标点 + trim 一站式归一(给中文宿主 enum_mapping 字符串匹配用)
+/// trim 是子集(只去首尾噪音);normalize 还做"中文括号/逗号/分号 → ASCII"中段替换
+/// 适用任何中文宿主的脏字段归一(不是 jvs-dw 特化),所以是通用 op
+export const DeriveNormalizeSchema = z.object({
+  op: z.literal('normalize'),
+  from: z.string().min(1),
+});
+
 export const DeriveCoalesceSchema = z.object({
  op: z.literal('coalesce'),
  from: z.array(z.string().min(1)).min(1).describe('按顺序取第一个非空字段值'),
@@ -157,6 +167,7 @@ export const DeriveExprSchema = z.discriminatedUnion('op', [
  DeriveUpperSchema,
  DeriveDefaultSchema,
  DeriveTrimSchema,
+  DeriveNormalizeSchema,
  DeriveCoalesceSchema,
 ]);
 export type DeriveExpr = z.infer<typeof DeriveExprSchema>;