Commit ff6ea96b by luoqi

feat(realtime-coach): 实时坐席辅助教练(Qwen-Omni-Realtime + Gemini Live 双 provider)

详情页正中底部麦克风钮 → 旁听患者语音 → 字幕式实时输出给客服的单句提示。
- 后端独立模块 realtime-coach:WS 网关(socket.io,JWT 握手)+ DashScope/Gemini Live
  两套代理(同 RealtimeProvider 接口,按 provider 选);DASHSCOPE/GEMINI key 只服务端
- 复用话术的患者上下文装配(buildScriptInputForPlan),教练专属指令(抓主诉/循序渐进,
  不复用销售脚本);turn-end(患者停顿)触发,前端 VAD 静音门控
- 前端独立组件:麦克风浮钮 + 字幕浮层(逐级虚化)+ 真实电平波形 + 背景虚化 + 模型切换

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
parent 41d7ef61
......@@ -56,8 +56,10 @@
"@nestjs/jwt": "^11.0.2",
"@nestjs/passport": "^11.0.5",
"@nestjs/platform-express": "^11.1.19",
"@nestjs/platform-socket.io": "^11.1.24",
"@nestjs/schedule": "^6.1.3",
"@nestjs/swagger": "^11.4.2",
"@nestjs/websockets": "^11.1.24",
"@pac/types": "workspace:*",
"@pac/utils": "workspace:*",
"@prisma/client": "^6.19.2",
......@@ -76,7 +78,9 @@
"passport-jwt": "^4.0.1",
"reflect-metadata": "^0.2.2",
"rxjs": "^7.8.1",
"socket.io": "^4.8.3",
"winston": "^3.19.0",
"ws": "^8.21.0",
"zod": "^4.4.3"
},
"devDependencies": {
......@@ -90,6 +94,7 @@
"@types/js-yaml": "^4.0.9",
"@types/node": "^22.10.2",
"@types/passport-jwt": "^4.0.1",
"@types/ws": "^8.18.1",
"jest": "^30.3.0",
"prisma": "^6.19.2",
"source-map-support": "^0.5.21",
......
......@@ -16,6 +16,7 @@ import { PlanModule } from './modules/plan/plan.module';
import { PlanAggregateModule } from './modules/plan-aggregate/plan-aggregate.module';
import { AgentModule } from './modules/agent/agent.module';
import { AiModule } from './modules/ai/ai.module';
import { RealtimeCoachModule } from './modules/realtime-coach/realtime-coach.module';
import { AdminModule } from './modules/admin/admin.module';
import { QueuesModule } from './queues/queues.module';
import { QueuesBullBoardModule } from './queues/bull-board.module';
......@@ -47,6 +48,7 @@ import { HealthController } from './health.controller';
PlanModule,
AgentModule,
AiModule,
RealtimeCoachModule,
AdminModule,
PlanAggregateModule,
],
......
......@@ -23,6 +23,12 @@ export interface AppConfig {
geminiBaseUrl: string;
/// Gemini 默认模型(前端传逻辑键 "gemini" 时解析到此具体型号)
geminiDefaultModel: string;
/// DashScope(阿里云百炼)API key — Qwen-Omni-Realtime 实时教练用;只服务端,绝不下发浏览器
dashscopeApiKey: string;
/// Qwen Omni 实时模型名(实时坐席辅助教练)
qwenOmniModel: string;
/// Gemini Live 实时模型名(实时教练可选 provider)
geminiLiveModel: string;
/// LLM 调用上限(秒),防卡死
requestTimeoutSec: number;
/// 价格表(¥/M tokens)— 从 AI_PRICE_TABLE_JSON env 读;调价时改 env 重启即可
......@@ -57,6 +63,9 @@ export function loadConfig(): AppConfig {
geminiApiKey: process.env.GEMINI_API_KEY ?? '',
geminiBaseUrl: process.env.GEMINI_BASE_URL ?? '',
geminiDefaultModel: process.env.GEMINI_DEFAULT_MODEL ?? 'gemini-3.5-flash',
dashscopeApiKey: process.env.DASHSCOPE_API_KEY ?? '',
qwenOmniModel: process.env.QWEN_OMNI_MODEL ?? 'qwen3-omni-flash-realtime',
geminiLiveModel: process.env.GEMINI_LIVE_MODEL ?? 'gemini-3.1-flash-live-preview',
requestTimeoutSec: Number(process.env.AI_REQUEST_TIMEOUT_SEC ?? 60),
priceTable: parsePriceTable(process.env.AI_PRICE_TABLE_JSON),
},
......
......@@ -51,6 +51,8 @@ import { PlanModule } from '../plan/plan.module';
PlanSummaryOrchestrator,
AiCallRunnerService,
AiProviderService,
// 实时坐席辅助教练复用:skills 注册表(组装 Qwen instructions)
DraftPlanScriptSkillRegistry,
],
})
export class AiModule {}
......@@ -93,6 +93,15 @@ export class PlanScriptOrchestrator {
) {}
/**
* 复用入口:给实时坐席辅助教练装配同一套 DraftPlanScriptInput(纯 DB 读,无副作用)。
* 实时教练用它的 patient/plan/clinicalContext 拼 Qwen instructions,跟话术生成共享上下文。
*/
async buildScriptInputForPlan(planId: string): Promise<DraftPlanScriptInput> {
const { plan, patient, persona, facts } = await this.loadPlanContext(planId);
return this.buildCallInput({ plan, patient, persona, facts });
}
/**
* 生成 / 重新生成 plan 话术。
* - 第一次生成:写 PlanScript 行
* - 重新生成:upsert(UNIQUE planId)— 内容覆盖,agentInvocationId 指向新调用
......
/**
* 实时坐席辅助教练 — instructions 包裹常量。
*
* 平衡点(两次踩坑后):
* - 不能像召回销售脚本那样"一步到位狂推约时间"(太急、公式化)
* - 也不能纯共情没目的、明知故问(没用)
* 正解:用 skills 专业知识 + 患者已知信息,【先抓主诉针对性解决,再循序渐进推进】。
*/
export const REALTIME_COACH_ROLE_HEADER = `# 你的角色:实时通话教练
你在旁听客服和患者打电话(麦克风主要收患者声),帮客服把这通电话聊好。
你的目标:**帮患者解决他这次最在意的问题(主诉),并一步步把他引导到合适的处理(面诊 / 治疗)** —— 但循序渐进,绝不是一上来就催着约时间。
你已经知道这位患者的情况(见下方背景),别明知故问;用你掌握的专业知识针对性地帮他。`;
/** skill body 作为"专业知识"注入(针对性取用,不是照着推销) */
export const REALTIME_COACH_KNOWLEDGE_HEADER = `# 专业知识(针对性取用,不是照着推销)
下面是该患者诊断相关的应对要点 / 异议处理 / 沟通风格资料。
用它们针对性地解答患者、给方向、显出你"懂行 + 懂他的情况";不是照着它一味推进度、催到店。`;
export const REALTIME_COACH_OUTPUT_CONSTRAINTS = `# 怎么给提示(最重要,违反即无效)
每次输出 = 给客服的【一句话】中文,**≤35 字,只给一个最该说的点**。纯一句话,不要长篇、不要 JSON / markdown / 分段。
【核心:抓主诉、循序渐进】
1. 先抓住患者这次最在意的(主诉:是疼?嫌贵?害怕?没时间?担心效果?),针对它回应。
2. 用你的专业知识 + 患者病史,给"懂行"的针对性回应或方向(别泛泛而谈、别问你已经知道的)。
3. 等顾虑解决、信任建立了,再自然往下一步(面诊 / 治疗)引导 —— **别一步到位、别每句都催约时间**。
【口吻】熟人聊天、有温度;别客服腔、别套话、别公式化,每条都不一样、别重复上一条。
【真实】涉及医生 / 治疗 / 历史,用下方背景里的真实数据,不编造。`;
import { Logger } from '@nestjs/common';
import { JwtService } from '@nestjs/jwt';
import {
ConnectedSocket,
MessageBody,
OnGatewayConnection,
OnGatewayDisconnect,
SubscribeMessage,
WebSocketGateway,
} from '@nestjs/websockets';
import type { Socket } from 'socket.io';
import { RealtimeCoachContextService } from './realtime-coach-context.service';
import { DashScopeRealtimeProxy } from './dashscope-realtime.proxy';
import { GeminiLiveProxy } from './gemini-live.proxy';
import type { CoachProxyHandle, RealtimeProvider } from './realtime-provider.types';
import type {
AudioFrameMsg,
CoachErrorMsg,
CoachReadyMsg,
CoachStartMsg,
CoachScope,
TextDeltaMsg,
TextDoneMsg,
} from './dto/coach-messages';
/**
* CoachGateway — 实时教练 WS 网关(socket.io)。
*
* 链路:浏览器麦克风音频 → 本网关 → DashScope 代理 → Qwen-Omni-Realtime → 文字回推浏览器。
* 鉴权:握手 query.token 走现有 JWT(WS 无法带自定义 header,与 SSE 流同思路)。
* 每个 socket 一条独立 DashScope 连接(instructions 隔离 + 断开即收)。
*/
@WebSocketGateway({
namespace: 'pac/v1/realtime/coach',
cors: { origin: true, credentials: false },
})
export class CoachGateway implements OnGatewayConnection, OnGatewayDisconnect {
private readonly logger = new Logger(CoachGateway.name);
private readonly proxies = new Map<string, CoachProxyHandle>();
constructor(
private readonly jwt: JwtService,
private readonly contextService: RealtimeCoachContextService,
private readonly qwenProxy: DashScopeRealtimeProxy,
private readonly geminiProxy: GeminiLiveProxy,
) {}
private pickProvider(key: 'qwen' | 'gemini' | undefined): RealtimeProvider {
return key === 'gemini' ? this.geminiProxy : this.qwenProxy;
}
handleConnection(client: Socket): void {
const token = (client.handshake.query.token as string | undefined) ?? '';
try {
const payload = this.jwt.verify<{ sub: string; tenantId: string; hostId: string }>(token);
const scope: CoachScope = {
tenantId: payload.tenantId,
hostId: payload.hostId,
userId: payload.sub,
};
client.data.scope = scope;
} catch {
client.emit('coach:error', { message: '鉴权失败' } satisfies CoachErrorMsg);
client.disconnect();
}
}
handleDisconnect(client: Socket): void {
this.proxies.get(client.id)?.close();
this.proxies.delete(client.id);
}
@SubscribeMessage('coach:start')
async onStart(
@MessageBody() msg: CoachStartMsg,
@ConnectedSocket() client: Socket,
): Promise<void> {
if (!client.data.scope) return;
if (this.proxies.has(client.id)) return; // 幂等
try {
const { instructions, skills } = await this.contextService.buildInstructions(msg.planId);
client.emit('coach:ready', { skills } satisfies CoachReadyMsg);
const provider = this.pickProvider(msg.provider);
const handle = await provider.open(instructions, {
onDelta: (text) => client.emit('text:delta', { text } satisfies TextDeltaMsg),
onDone: (fullText) => client.emit('text:done', { fullText } satisfies TextDoneMsg),
onError: (message) => client.emit('coach:error', { message } satisfies CoachErrorMsg),
});
this.proxies.set(client.id, handle);
this.logger.log(`coach started: plan=${msg.planId} skills=${skills.length}`);
} catch (err) {
const message = err instanceof Error ? err.message : String(err);
client.emit('coach:error', { message } satisfies CoachErrorMsg);
}
}
@SubscribeMessage('audio:frame')
onAudioFrame(@MessageBody() msg: AudioFrameMsg, @ConnectedSocket() client: Socket): void {
this.proxies.get(client.id)?.appendAudio(msg.frame);
}
@SubscribeMessage('coach:stop')
onStop(@ConnectedSocket() client: Socket): void {
this.proxies.get(client.id)?.close();
this.proxies.delete(client.id);
}
}
import { Injectable, Logger } from '@nestjs/common';
import { ConfigService } from '@nestjs/config';
import WebSocket from 'ws';
import type { AppConfig } from '../../config/configuration';
import type { CoachProxyHandle, ProxyHandlers, RealtimeProvider } from './realtime-provider.types';
/**
* DashScopeRealtimeProxy — Qwen-Omni-Realtime(阿里云百炼)实时会话代理。
*
* 手动观察模式(POC 已实测):
* session.update { modalities:['text'], instructions, input_audio_format:'pcm16', turn_detection:null }
* 逐帧 input_audio_buffer.append;患者停顿(无新音频 ≥600ms)→ commit + response.create;
* 收 response.text.delta(流式)/ response.text.done。
* DASHSCOPE key 只在这里出现(出站 header),浏览器永不接触。
*/
@Injectable()
export class DashScopeRealtimeProxy implements RealtimeProvider {
private readonly logger = new Logger(DashScopeRealtimeProxy.name);
/** 患者停顿这么久(无新音频)→ 算说完一句 → 触发一次提示(turn-end,而非固定时钟) */
private static readonly TURN_SILENCE_MS = 600;
/** 一直说不停的保护:单句最长这么久强制切一次,避免无限累积 */
private static readonly MAX_UTTERANCE_MS = 15000;
private static readonly TICK_MS = 200;
constructor(private readonly config: ConfigService<AppConfig, true>) {}
async open(instructions: string, handlers: ProxyHandlers): Promise<CoachProxyHandle> {
const ai = this.config.get('ai', { infer: true });
if (!ai.dashscopeApiKey) {
throw new Error('DASHSCOPE_API_KEY 未设置 — 实时教练不可用');
}
const url = `wss://dashscope.aliyuncs.com/api-ws/v1/realtime?model=${ai.qwenOmniModel}`;
const ws = new WebSocket(url, { headers: { Authorization: `bearer ${ai.dashscopeApiKey}` } });
let sentence = '';
let tickTimer: ReturnType<typeof setInterval> | null = null;
let hasUncommitted = false; // 自上次 commit 以来有没有新音频(一句话在攒)
let lastAppendAt = 0; // 最近一帧音频时间(测停顿)
let utteranceStartAt = 0; // 当前这句开始时间(测超长)
await new Promise<void>((resolve, reject) => {
ws.once('open', () => {
ws.send(
JSON.stringify({
type: 'session.update',
session: {
modalities: ['text'],
instructions,
input_audio_format: 'pcm16',
turn_detection: null, // 关自动轮次,由本代理按"患者停顿"触发
},
}),
);
// turn-end 触发:攒着的音频,等患者停顿 ≥TURN_SILENCE_MS(或单句超长)→ commit + 一次 response
tickTimer = setInterval(() => {
if (ws.readyState !== WebSocket.OPEN || !hasUncommitted) return;
const now = Date.now();
const silenceGap = now - lastAppendAt;
const uttDur = now - utteranceStartAt;
if (
silenceGap >= DashScopeRealtimeProxy.TURN_SILENCE_MS ||
uttDur >= DashScopeRealtimeProxy.MAX_UTTERANCE_MS
) {
ws.send(JSON.stringify({ type: 'input_audio_buffer.commit' }));
ws.send(JSON.stringify({ type: 'response.create' }));
hasUncommitted = false;
}
}, DashScopeRealtimeProxy.TICK_MS);
resolve();
});
ws.once('error', (e) => reject(e instanceof Error ? e : new Error(String(e))));
});
ws.on('message', (data: WebSocket.RawData) => {
let evt: { type?: string; delta?: string; error?: { message?: string } };
try {
evt = JSON.parse(data.toString());
} catch {
return;
}
switch (evt.type) {
case 'response.text.delta':
sentence += evt.delta ?? '';
handlers.onDelta(evt.delta ?? '');
break;
case 'response.text.done':
handlers.onDone(sentence);
sentence = '';
break;
case 'error': {
const m = evt.error?.message ?? 'dashscope error';
// 末尾/静音时 commit 空缓冲会报 "buffer too small" — 瞬时无害,不当 fatal 上报(否则误杀会话)
if (/buffer too small|have no audio|empty/i.test(m)) {
this.logger.debug(`benign dashscope notice: ${m}`);
break;
}
handlers.onError(m);
break;
}
default:
break; // session.updated / response.done 等忽略
}
});
ws.on('close', () => {
if (tickTimer) clearInterval(tickTimer);
});
ws.on('error', (e) => this.logger.warn(`dashscope ws error: ${(e as Error).message}`));
return {
appendAudio: (base64Frame) => {
if (ws.readyState !== WebSocket.OPEN) return;
ws.send(JSON.stringify({ type: 'input_audio_buffer.append', audio: base64Frame }));
const now = Date.now();
if (!hasUncommitted) utteranceStartAt = now; // 一句话的起点
hasUncommitted = true;
lastAppendAt = now;
},
close: () => {
if (tickTimer) clearInterval(tickTimer);
try {
ws.close();
} catch {
/* ignore */
}
},
};
}
}
/**
* 实时教练 WS 线协议(浏览器 ⇄ 网关)。
* v1 不持久化,纯 interface(无装饰器 / 无 Zod 落库)。
*/
/** client → server */
export interface CoachStartMsg {
planId: string;
/** 选用哪个实时模型(默认 qwen) */
provider?: 'qwen' | 'gemini';
}
export interface AudioFrameMsg {
frame: string; // base64 PCM16 16kHz mono
}
/** server → client */
export interface CoachReadyMsg {
skills: string[]; // 激活的 skill 名(面板展示用)
}
export interface TextDeltaMsg {
text: string; // 流式增量
}
export interface TextDoneMsg {
fullText: string; // 一句完成
}
export interface CoachErrorMsg {
message: string;
}
/** 鉴权后挂到 socket.data 的租户隔离上下文 */
export interface CoachScope {
tenantId: string;
hostId: string;
userId: string;
}
import { Injectable, Logger } from '@nestjs/common';
import { ConfigService } from '@nestjs/config';
import WebSocket from 'ws';
import type { AppConfig } from '../../config/configuration';
import type { CoachProxyHandle, ProxyHandlers, RealtimeProvider } from './realtime-provider.types';
/**
* GeminiLiveProxy — Gemini Live(BidiGenerateContent)实时会话代理。
*
* 协议与 Qwen 不同(POC 已实测):
* - WS: .../BidiGenerateContent?key=KEY(key 走 query,只服务端)
* - setup: responseModalities=['AUDIO'](native-audio 模型不支持 TEXT 输出),
* 开 outputAudioTranscription → 取模型"说"的提示的【文字转写】;关自动 VAD,手动 activity 控轮
* - 每句:activityStart → realtimeInput.audio 逐帧 → activityEnd(触发本轮)
* - 输出:serverContent.outputTranscription.text(流式);generationComplete=一句完成
*/
@Injectable()
export class GeminiLiveProxy implements RealtimeProvider {
private readonly logger = new Logger(GeminiLiveProxy.name);
private static readonly TURN_SILENCE_MS = 600;
private static readonly MAX_UTTERANCE_MS = 15000;
private static readonly TICK_MS = 200;
constructor(private readonly config: ConfigService<AppConfig, true>) {}
async open(instructions: string, handlers: ProxyHandlers): Promise<CoachProxyHandle> {
const ai = this.config.get('ai', { infer: true });
if (!ai.geminiApiKey) throw new Error('GEMINI_API_KEY 未设置 — Gemini Live 不可用');
const url =
`wss://generativelanguage.googleapis.com/ws/` +
`google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent?key=${ai.geminiApiKey}`;
const ws = new WebSocket(url);
let sentence = '';
let inUtterance = false; // 当前在一句话内(已 activityStart,未 activityEnd)
let lastAppendAt = 0;
let utteranceStartAt = 0;
let tickTimer: ReturnType<typeof setInterval> | null = null;
const endUtterance = () => {
if (ws.readyState === WebSocket.OPEN) ws.send(JSON.stringify({ realtimeInput: { activityEnd: {} } }));
inUtterance = false;
};
// 等 setupComplete 才算就绪
await new Promise<void>((resolve, reject) => {
const onSetup = (data: WebSocket.RawData) => {
let m: { setupComplete?: unknown };
try {
m = JSON.parse(data.toString());
} catch {
return;
}
if (m.setupComplete) {
ws.off('message', onSetup);
resolve();
}
};
ws.once('open', () => {
ws.send(
JSON.stringify({
setup: {
model: `models/${ai.geminiLiveModel}`,
generationConfig: { responseModalities: ['AUDIO'] },
systemInstruction: { parts: [{ text: instructions }] },
outputAudioTranscription: {},
inputAudioTranscription: {},
realtimeInputConfig: { automaticActivityDetection: { disabled: true } },
},
}),
);
tickTimer = setInterval(() => {
if (!inUtterance) return;
const now = Date.now();
if (
now - lastAppendAt >= GeminiLiveProxy.TURN_SILENCE_MS ||
now - utteranceStartAt >= GeminiLiveProxy.MAX_UTTERANCE_MS
) {
endUtterance();
}
}, GeminiLiveProxy.TICK_MS);
ws.on('message', onSetup);
});
ws.once('error', (e) => reject(e instanceof Error ? e : new Error(String(e))));
});
// 业务消息处理
ws.on('message', (data: WebSocket.RawData) => {
let m: {
serverContent?: {
outputTranscription?: { text?: string };
generationComplete?: boolean;
turnComplete?: boolean;
};
sessionResumptionUpdate?: unknown;
};
try {
m = JSON.parse(data.toString());
} catch {
return;
}
const sc = m.serverContent;
if (!sc) return; // sessionResumptionUpdate 等忽略
if (sc.outputTranscription?.text) {
sentence += sc.outputTranscription.text;
handlers.onDelta(sc.outputTranscription.text);
}
if (sc.generationComplete || sc.turnComplete) {
if (sentence.trim()) handlers.onDone(sentence);
sentence = '';
}
});
ws.on('close', () => {
if (tickTimer) clearInterval(tickTimer);
});
ws.on('error', (e) => this.logger.warn(`gemini live ws error: ${(e as Error).message}`));
return {
appendAudio: (base64Frame) => {
if (ws.readyState !== WebSocket.OPEN) return;
if (!inUtterance) {
ws.send(JSON.stringify({ realtimeInput: { activityStart: {} } }));
inUtterance = true;
utteranceStartAt = Date.now();
}
ws.send(
JSON.stringify({
realtimeInput: { audio: { data: base64Frame, mimeType: 'audio/pcm;rate=16000' } },
}),
);
lastAppendAt = Date.now();
},
close: () => {
if (tickTimer) clearInterval(tickTimer);
try {
ws.close();
} catch {
/* ignore */
}
},
};
}
}
import { Injectable } from '@nestjs/common';
import { PlanScriptOrchestrator } from '../ai/orchestrators/plan-script.orchestrator';
import { DraftPlanScriptSkillRegistry } from '../ai/calls/draft-plan-script/skill-registry.service';
import { composeSystem } from '../ai/calls/draft-plan-script/skill-composer';
import type { DraftPlanScriptInput } from '../ai/calls/draft-plan-script/input.types';
import {
REALTIME_COACH_ROLE_HEADER,
REALTIME_COACH_KNOWLEDGE_HEADER,
REALTIME_COACH_OUTPUT_CONSTRAINTS,
} from './coach-prompts';
/**
* RealtimeCoachContextService — 给 Qwen-Omni-Realtime 装配 instructions。
*
* 复用的是【患者上下文装配管线】(PlanScriptOrchestrator.buildScriptInputForPlan + composeSystem
* 选中的 skills 列表),**但不注入 skill body** —— PAC 的 skills 本质是"邀约到店"的召回销售脚本,
* 拿来当教练知识会把它变成推销机器(实测如此)。教练只需:患者临床事实 + 人情味教练指令;
* 牙科常识模型自带。composeSystem 仍用于得出"哪些 skills 相关"(给面板展示 skills tag)。
*/
@Injectable()
export class RealtimeCoachContextService {
constructor(
private readonly planScriptOrchestrator: PlanScriptOrchestrator,
private readonly skillRegistry: DraftPlanScriptSkillRegistry,
) {}
async buildInstructions(planId: string): Promise<{ instructions: string; skills: string[] }> {
const input = await this.planScriptOrchestrator.buildScriptInputForPlan(planId);
const composed = composeSystem(input, this.skillRegistry.getAllSkills());
// skill body 作为"专业知识"注入,但排除 scenario-*(那是"一步到位约到店"的召回剧本,
// 会盖过"抓主诉、循序渐进"的口径);保留 diagnosis / objection / relationship / population / safety。
const knowledgeBlock = composed.matchedSkills
.filter((s) => !s.frontmatter.name.startsWith('scenario-'))
.map((s) => `## [${s.frontmatter.name}]\n${s.body}`)
.join('\n\n---\n\n');
const instructions = [
REALTIME_COACH_ROLE_HEADER,
this.buildPatientContextBlock(input),
knowledgeBlock ? `${REALTIME_COACH_KNOWLEDGE_HEADER}\n\n${knowledgeBlock}` : '',
REALTIME_COACH_OUTPUT_CONSTRAINTS, // 输出约束放最后,最近最强
]
.filter(Boolean)
.join('\n\n');
return {
instructions,
skills: composed.matchedSkills.map((s) => s.frontmatter.name),
};
}
private buildPatientContextBlock(input: DraftPlanScriptInput): string {
const cc = input.clinicalContext;
const persona = input.personaHighlights.map((h) => `${h.label}:${h.description}`).join(' · ');
return [
'# 当前通话患者背景(真实数据,供你聊天时取用,不要编造、别照着推销)',
`- 诊所:${input.clinicName} · 称呼:${input.patient.nameMasked} · 年龄:${input.patient.age ?? '未知'}`,
persona ? `- 画像:${persona}` : '',
`- 上次就诊发现 / 待处理:${cc.pendingTreatments.join('、') || input.plan.reasons[0]?.reason || '(无)'}`,
`- 主诊医生:${cc.primaryDoctorName ?? '(未知)'} · 距上次到店:${cc.daysSinceLastVisit ?? '未知'} `,
cc.lastVisitSummary ? `- 上次到店:${cc.lastVisitSummary}` : '',
cc.ongoingChains.length ? `- 在管治疗:${cc.ongoingChains.join(' / ')}` : '',
`- 老客/新客:已完成 ${cc.completedTreatmentCount} 次治疗`,
]
.filter(Boolean)
.join('\n');
}
}
import { Module } from '@nestjs/common';
import { AiModule } from '../ai/ai.module';
import { AuthModule } from '../auth/auth.module';
import { CoachGateway } from './coach.gateway';
import { RealtimeCoachContextService } from './realtime-coach-context.service';
import { DashScopeRealtimeProxy } from './dashscope-realtime.proxy';
import { GeminiLiveProxy } from './gemini-live.proxy';
/**
* 实时坐席辅助教练模块(独立)。
* - AiModule:复用 PlanScriptOrchestrator(上下文装配)+ DraftPlanScriptSkillRegistry(skills)
* - AuthModule:复用 JwtModule(WS 握手鉴权)
*/
@Module({
imports: [AiModule, AuthModule],
providers: [CoachGateway, RealtimeCoachContextService, DashScopeRealtimeProxy, GeminiLiveProxy],
})
export class RealtimeCoachModule {}
/**
* 实时教练 provider 抽象 —— Qwen-Omni-Realtime 与 Gemini Live 两套不同协议,
* 对网关暴露同一个 open() 接口:喂音频 → 回吐文字提示。
*/
export type RealtimeProviderKey = 'qwen' | 'gemini';
export interface CoachProxyHandle {
/** 灌一帧 base64 PCM16 16k 音频(provider 内部按"患者停顿"切轮触发) */
appendAudio(base64Frame: string): void;
close(): void;
}
export interface ProxyHandlers {
onDelta: (text: string) => void; // 流式增量提示
onDone: (full: string) => void; // 一句完成
onError: (msg: string) => void;
}
export interface RealtimeProvider {
/** 开一条实时会话(instructions=教练 system + 患者上下文);resolve 时已就绪可灌音频 */
open(instructions: string, handlers: ProxyHandlers): Promise<CoachProxyHandle>;
}
......@@ -26,6 +26,7 @@
"next": "^16.2.4",
"react": "^19.2.5",
"react-dom": "^19.2.5",
"socket.io-client": "^4.8.3",
"sonner": "^2.0.7",
"tailwind-merge": "^3.5.0",
"zustand": "^5.0.13"
......
......@@ -151,3 +151,32 @@ body {
color: transparent;
animation: shimmerText 1.6s linear infinite;
}
/* 实时教练 — 语音波形脉冲 */
@keyframes audioWave {
0%, 100% { height: 4px; opacity: 0.45; }
50% { height: 28px; opacity: 1; }
}
.animate-audioWave { animation: audioWave 0.7s ease-in-out infinite; }
/* 实时教练 — 录音中麦克风按钮呼吸光圈 */
@keyframes coachPulse {
0% { box-shadow: 0 0 0 0 rgba(244,63,94,0.45); }
70% { box-shadow: 0 0 0 12px rgba(244,63,94,0); }
100% { box-shadow: 0 0 0 0 rgba(244,63,94,0); }
}
.animate-coachPulse { animation: coachPulse 1.5s ease-out infinite; }
/* 实时教练 — 面板进场 */
@keyframes slideInUp {
from { opacity: 0; transform: translateY(12px); }
to { opacity: 1; transform: translateY(0); }
}
.animate-slideInUp { animation: slideInUp 0.2s ease-out; }
/* 实时教练 — 字幕当前句滑入吸睛(新句从下方淡入+轻微放大) */
@keyframes lyricIn {
from { opacity: 0; transform: translateY(10px) scale(0.97); }
to { opacity: 1; transform: translateY(0) scale(1); }
}
.animate-lyricIn { animation: lyricIn 0.32s cubic-bezier(0.22, 1, 0.36, 1); }
......@@ -41,6 +41,7 @@ import {
import type { AdaptedFact } from './adapt-data';
import { useScriptStream } from './use-script-stream';
import { useSummaryStream } from './use-summary-stream';
import { RealtimeCoach } from '@/components/realtime-coach';
import { submitExecution, adaptAbandonReasons } from './execution-api';
/// 话术生成模型(具体型号,直传后端 AiProviderService.resolve)
......@@ -285,7 +286,7 @@ export function PlanDetailApp({
</aside>
}
centerPane={
<main className="min-h-0 flex flex-col h-full">
<main className="relative min-h-0 flex flex-col h-full">
<section className="bg-white rounded-lg border border-slate-200 shadow-sm flex flex-col min-h-0 flex-1 overflow-hidden">
{/* 窄屏 flex-wrap 自然换行,gap-y 给行间距 */}
<header className="flex-none px-3 sm:px-4 py-2.5 border-b border-slate-100 flex flex-wrap items-center justify-between gap-x-2 gap-y-2">
......@@ -368,6 +369,8 @@ export function PlanDetailApp({
}}
/>
</section>
{/* 实时坐席辅助教练:正中底部麦克风悬浮按钮 + 悬浮面板(独立模块) */}
<RealtimeCoach planId={plan.id} />
</main>
}
rightPane={
......
/**
* 浏览器音频工具 — 麦克风 Float32 → 16kHz PCM16 → base64。
* 浏览器无 Buffer,base64 用 btoa 分块(避免大数组 String.fromCharCode 栈溢出)。
*/
/** 线性降采样到 16kHz(麦克风通常 44.1k/48k) */
export function downsampleTo16k(input: Float32Array, fromRate: number): Float32Array {
const toRate = 16000;
if (fromRate === toRate) return input;
const ratio = fromRate / toRate;
const outLen = Math.floor(input.length / ratio);
const out = new Float32Array(outLen);
for (let i = 0; i < outLen; i++) {
const idx = i * ratio;
const lo = Math.floor(idx);
const hi = Math.min(lo + 1, input.length - 1);
const frac = idx - lo;
out[i] = input[lo]! * (1 - frac) + input[hi]! * frac;
}
return out;
}
/** Float32 [-1,1] → Int16 LE bytes */
export function floatToPcm16(input: Float32Array): Uint8Array {
const out = new DataView(new ArrayBuffer(input.length * 2));
for (let i = 0; i < input.length; i++) {
const s = Math.max(-1, Math.min(1, input[i]!));
out.setInt16(i * 2, s < 0 ? s * 0x8000 : s * 0x7fff, true);
}
return new Uint8Array(out.buffer);
}
/** Uint8 → base64(分块,避免栈溢出) */
export function bytesToBase64(bytes: Uint8Array): string {
let binary = '';
const chunk = 0x8000;
for (let i = 0; i < bytes.length; i += chunk) {
binary += String.fromCharCode(...bytes.subarray(i, i + chunk));
}
return btoa(binary);
}
'use client';
import { useEffect, useRef, type RefObject } from 'react';
const BAR_COUNT = 28;
const MIN_H = 3;
const MAX_H = 34;
/**
* 波形 — 读真实麦克风电平(AnalyserNode + requestAnimationFrame),有声才起伏,静音趋平。
* 直接改 DOM 的 style.height,不触发 React re-render。
*/
export function AudioVisualizer({
analyserRef,
active,
}: {
analyserRef: RefObject<AnalyserNode | null>;
active: boolean;
}) {
const barsRef = useRef<(HTMLSpanElement | null)[]>([]);
useEffect(() => {
if (!active) return;
let raf = 0;
const buf = new Uint8Array(128);
const tick = () => {
const an = analyserRef.current;
const bars = barsRef.current;
if (an) {
an.getByteFrequencyData(buf);
// 取低频段(人声主要能量),映射到 BAR_COUNT 根柱子,中间高两边低更像声纹
for (let i = 0; i < BAR_COUNT; i++) {
const bin = Math.floor((i / BAR_COUNT) * 48) + 1; // 用前 ~48 个频段
const v = (buf[bin] ?? 0) / 255; // 0..1
const center = 1 - Math.abs(i - (BAR_COUNT - 1) / 2) / (BAR_COUNT / 2);
const h = MIN_H + v * (MAX_H - MIN_H) * (0.5 + 0.5 * center);
const el = bars[i];
if (el) el.style.height = `${h}px`;
}
}
raf = requestAnimationFrame(tick);
};
raf = requestAnimationFrame(tick);
return () => cancelAnimationFrame(raf);
}, [active, analyserRef]);
return (
<div className="h-9 flex items-center justify-center gap-[3px]">
{Array.from({ length: BAR_COUNT }).map((_, i) => (
<span
key={i}
ref={(el) => {
barsRef.current[i] = el;
}}
className="w-[3px] rounded-full bg-teal-500/90 transition-[height] duration-75 ease-out"
style={{ height: `${MIN_H}px` }}
/>
))}
</div>
);
}
'use client';
import { type RefObject } from 'react';
import { cn } from '@/lib/utils';
import { AudioVisualizer } from './audio-visualizer';
import type { CoachState } from './types';
/** 逐级虚化:depth 0=当前句(清晰醒目),1=前一句,2=前前一句(越上越淡越糊),更老的不展示 */
const DEPTH_STYLE = [
'text-[15.5px] font-semibold text-slate-900 leading-relaxed [text-shadow:0_1px_10px_rgba(255,255,255,0.95)]',
'text-[13.5px] text-slate-500 leading-snug opacity-85 line-clamp-2',
'text-[12.5px] text-slate-400 leading-snug blur-[0.6px] opacity-60 line-clamp-1',
];
/**
* 字幕式教练浮层 — 锚在麦克风钮正上方,居中。
* 自下而上:波形 → 当前句(醒目滑入)→ 前一句 → 前前一句(逐级虚化)。
*/
export function CoachOverlay({
state,
analyserRef,
}: {
state: CoachState;
analyserRef: RefObject<AnalyserNode | null>;
}) {
const active = state.status === 'active';
// 组装"最新在前"的栈:[当前, 前一, 前前](最多 3 条)
const stack: { text: string; key: string }[] = [];
let placeholder = '';
if (state.status === 'connecting') {
placeholder = '连接中…';
} else if (state.status === 'error') {
placeholder = state.message;
} else if (active) {
const lines = state.lines;
const streaming = state.streaming.trim();
if (streaming) {
stack.push({ text: streaming, key: `s${lines.length}` });
for (let d = 1; d <= 2; d++) {
const t = lines[lines.length - d];
if (t) stack.push({ text: t, key: `l${lines.length - d}` });
}
} else {
for (let d = 0; d <= 2; d++) {
const t = lines[lines.length - 1 - d];
if (t) stack.push({ text: t, key: `l${lines.length - 1 - d}` });
}
}
if (stack.length === 0) placeholder = '正在收听,患者开口后给提示…';
}
return (
<div className="absolute bottom-[4.75rem] left-1/2 -translate-x-1/2 z-20 w-[min(94%,620px)] flex flex-col items-center gap-2.5 pointer-events-none">
{/* 字幕栈(渲染时反转:越老越上) */}
<div className="w-full flex flex-col items-center justify-end gap-2 px-6 min-h-[5rem]">
{stack.length === 0 ? (
<p className="text-[12.5px] text-slate-400">{placeholder}</p>
) : (
[...stack].reverse().map((item, i) => {
const depth = stack.length - 1 - i; // 反转后:末项=当前句 depth 0
return (
<p
key={item.key}
className={cn(
'max-w-full text-center transition-all duration-300',
DEPTH_STYLE[depth],
depth === 0 && 'animate-lyricIn',
)}
>
{item.text}
</p>
);
})
)}
</div>
{/* 波形:正下方,随真实麦克风电平起伏 */}
<AudioVisualizer analyserRef={analyserRef} active={active} />
</div>
);
}
'use client';
import { useState } from 'react';
import { cn } from '@/lib/utils';
import { useRealtimeCoach } from './use-realtime-coach';
import { RealtimeCoachButton } from './realtime-coach-button';
import { CoachOverlay } from './coach-overlay';
type Provider = 'qwen' | 'gemini';
const PROVIDERS: { key: Provider; label: string }[] = [
{ key: 'qwen', label: 'Qwen' },
{ key: 'gemini', label: 'Gemini' },
];
/**
* 实时坐席辅助教练(独立功能,详情页单一挂载点)。
* 正中底部麦克风钮 + 字幕式浮层;空闲时可选实时模型(Qwen-Omni / Gemini Live)。
*/
export function RealtimeCoach({ planId }: { planId: string }) {
const { state, start, stop, analyserRef } = useRealtimeCoach(planId);
const [provider, setProvider] = useState<Provider>('qwen');
const active = state.status !== 'idle';
return (
<>
{/* 开启教练:背景磨砂虚化,突出字幕(pointer-events-none 不挡操作) */}
{active && (
<div className="absolute inset-0 z-10 bg-white/45 backdrop-blur-sm pointer-events-none animate-slideInUp" />
)}
{active && <CoachOverlay state={state} analyserRef={analyserRef} />}
{/* 模型选择(仅空闲时,在麦克风钮上方) */}
{!active && (
<div className="absolute bottom-[4.25rem] left-1/2 -translate-x-1/2 z-20 inline-flex items-center rounded-full bg-slate-100/90 backdrop-blur-sm p-0.5 text-[11px] shadow-sm">
{PROVIDERS.map((p) => (
<button
key={p.key}
type="button"
onClick={() => setProvider(p.key)}
className={cn(
'px-2.5 py-0.5 rounded-full transition-colors',
provider === p.key
? 'bg-white font-semibold text-slate-800 shadow-sm'
: 'text-slate-500 hover:text-slate-700',
)}
>
{p.label}
</button>
))}
</div>
)}
<RealtimeCoachButton active={active} onClick={() => (active ? stop() : void start(provider))} />
</>
);
}
import { cn } from '@/lib/utils';
/** 详情页正中底部麦克风悬浮按钮 */
export function RealtimeCoachButton({ active, onClick }: { active: boolean; onClick: () => void }) {
return (
<button
type="button"
onClick={onClick}
title={active ? '结束实时教练' : '开启实时教练(旁听患者语音,实时出指导)'}
className={cn(
'absolute bottom-4 left-1/2 -translate-x-1/2 z-20 w-12 h-12 rounded-full',
'shadow-md flex items-center justify-center text-white transition-all duration-200',
'cursor-pointer active:scale-95',
active
? 'bg-rose-500 hover:bg-rose-600 animate-coachPulse'
: 'bg-teal-600 hover:bg-teal-700',
)}
>
{active ? (
<svg viewBox="0 0 24 24" fill="currentColor" className="w-5 h-5">
<rect x="6" y="6" width="12" height="12" rx="2" />
</svg>
) : (
<svg viewBox="0 0 24 24" fill="currentColor" className="w-5 h-5">
<path d="M12 14a3 3 0 0 0 3-3V6a3 3 0 1 0-6 0v5a3 3 0 0 0 3 3z" />
<path d="M19 11a7 7 0 0 1-14 0H3a9 9 0 0 0 8 8.94V23h2v-3.06A9 9 0 0 0 21 11z" />
</svg>
)}
</button>
);
}
/** 实时教练前端状态(字幕式:lines=已完成的历史句,streaming=正在流式的当前句) */
export type CoachState =
| { status: 'idle' }
| { status: 'connecting' }
| {
status: 'active';
lines: string[]; // 已完成的指导句(只需末 2 句)
streaming: string; // 正在流式输出的当前句
skills: string[];
elapsedSeconds: number;
}
| { status: 'error'; message: string };
'use client';
import { useCallback, useRef, useState } from 'react';
import { io, type Socket } from 'socket.io-client';
import { env } from '@/lib/env';
import { useAuthStore } from '@/stores/auth-store';
import type { CoachState } from './types';
import { downsampleTo16k, floatToPcm16, bytesToBase64 } from './audio-utils';
/**
* useRealtimeCoach — 连实时教练网关 + 麦克风采集。
* - socket.io 连 /pac/v1/realtime/coach(token 走 query)
* - getUserMedia → ScriptProcessor 采 PCM16 16k → emit audio:frame(静音门控 VAD)
* - AnalyserNode 接原始麦克风 → analyserRef 给波形读真实电平(有声才动)
* - 字幕式:text:delta 累积到 streaming;text:done 推入 lines(只留末 2 句)
* v1:麦克风单路整体当患者语音。
*/
export function useRealtimeCoach(planId: string) {
const [state, setState] = useState<CoachState>({ status: 'idle' });
const socketRef = useRef<Socket | null>(null);
const streamRef = useRef<MediaStream | null>(null);
const ctxRef = useRef<AudioContext | null>(null);
const procRef = useRef<ScriptProcessorNode | null>(null);
const timerRef = useRef<ReturnType<typeof setInterval> | null>(null);
/** 波形读真实麦克风电平用 */
const analyserRef = useRef<AnalyserNode | null>(null);
const cleanup = useCallback(() => {
if (timerRef.current) clearInterval(timerRef.current);
procRef.current?.disconnect();
void ctxRef.current?.close().catch(() => undefined);
streamRef.current?.getTracks().forEach((t) => t.stop());
socketRef.current?.disconnect();
timerRef.current = null;
procRef.current = null;
ctxRef.current = null;
streamRef.current = null;
socketRef.current = null;
analyserRef.current = null;
}, []);
const start = useCallback(async (provider: 'qwen' | 'gemini' = 'qwen') => {
const token = useAuthStore.getState().accessToken;
if (!token) {
setState({ status: 'error', message: '未鉴权 — 请从宿主系统重新打开' });
return;
}
setState({ status: 'connecting' });
const socket = io(`${env.apiBaseUrl}/pac/v1/realtime/coach`, {
query: { token },
transports: ['websocket'],
forceNew: true,
});
socketRef.current = socket;
socket.on('connect', () => socket.emit('coach:start', { planId, provider }));
socket.on('coach:ready', ({ skills }: { skills: string[] }) => {
const startedAt = Date.now();
setState({ status: 'active', lines: [], streaming: '', skills, elapsedSeconds: 0 });
timerRef.current = setInterval(() => {
setState((s) =>
s.status === 'active'
? { ...s, elapsedSeconds: Math.floor((Date.now() - startedAt) / 1000) }
: s,
);
}, 1000);
});
socket.on('text:delta', ({ text }: { text: string }) =>
setState((s) => (s.status === 'active' ? { ...s, streaming: s.streaming + text } : s)),
);
socket.on('text:done', () =>
setState((s) => {
if (s.status !== 'active') return s;
const done = s.streaming.trim();
if (!done) return { ...s, streaming: '' };
return { ...s, lines: [...s.lines, done].slice(-3), streaming: '' };
}),
);
socket.on('coach:error', ({ message }: { message: string }) => {
setState({ status: 'error', message });
cleanup();
});
socket.on('disconnect', () =>
setState((s) => (s.status === 'active' ? { status: 'idle' } : s)),
);
try {
await startMic(socket);
} catch {
setState({ status: 'error', message: '麦克风开启失败 — 请允许麦克风权限' });
cleanup();
}
}, [planId, cleanup]);
const startMic = async (socket: Socket) => {
const stream = await navigator.mediaDevices.getUserMedia({
// 开 WebRTC 降噪/回声消除/自动增益,过滤稳态背景噪音
audio: { echoCancellation: true, noiseSuppression: true, autoGainControl: true },
});
streamRef.current = stream;
const ctx = new AudioContext();
ctxRef.current = ctx;
const source = ctx.createMediaStreamSource(stream);
// 波形:接原始麦克风,反映真实电平(不受 VAD 门控影响)
const analyser = ctx.createAnalyser();
analyser.fftSize = 256;
analyser.smoothingTimeConstant = 0.7;
source.connect(analyser);
analyserRef.current = analyser;
const proc = ctx.createScriptProcessor(4096, 1, 1);
procRef.current = proc;
// 静音门控(VAD):RMS 低于阈值不发帧 → 没人说话不触发模型。
// 提高阈值降噪;并要求连续 ONSET 帧超阈值才"开口"(杀掉键盘/咳嗽等短噪声尖峰)。
const SILENCE_RMS = 0.03;
const ONSET_FRAMES = 4; // 需连续 ~340ms 持续有声才"开口" → 拍手/键盘等瞬时尖峰凑不齐,被拒
const HANGOVER_FRAMES = 6;
let hangover = 0;
let loudStreak = 0;
proc.onaudioprocess = (e) => {
if (socket.disconnected) return;
const f32 = e.inputBuffer.getChannelData(0);
let sum = 0;
for (let i = 0; i < f32.length; i++) sum += f32[i]! * f32[i]!;
const rms = Math.sqrt(sum / f32.length);
if (rms > SILENCE_RMS) {
loudStreak++;
if (loudStreak >= ONSET_FRAMES) hangover = HANGOVER_FRAMES;
} else {
loudStreak = 0;
}
if (hangover <= 0) return;
hangover--;
const pcm = floatToPcm16(downsampleTo16k(f32, ctx.sampleRate));
socket.emit('audio:frame', { frame: bytesToBase64(pcm) });
};
source.connect(proc);
proc.connect(ctx.destination);
};
const stop = useCallback(() => {
socketRef.current?.emit('coach:stop');
cleanup();
setState({ status: 'idle' });
}, [cleanup]);
return { state, start, stop, analyserRef };
}
......@@ -35,12 +35,15 @@
]
},
"devDependencies": {
"@eslint/js": "^9.17.0",
"@types/node": "^22.10.2",
"eslint": "^9.17.0",
"prettier": "^3.8.3",
"turbo": "^2.9.9",
"typescript": "^5.9.3",
"typescript-eslint": "^8.59.2",
"@eslint/js": "^9.17.0"
"typescript-eslint": "^8.59.2"
},
"dependencies": {
"ws": "^8.21.0"
}
}
#!/usr/bin/env node
/**
* Qwen-Omni-Realtime "静默教练" POC 测试
* ────────────────────────────────────────────────────────────────────────
* 目的:验证 Qwen-Omni-Realtime 能否在"持续灌入患者语音"的同时,
* 实时吐出教练文本提示,且【不把患者继续说话当成打断、不卡在轮次结束才出】。
*
* 验证手段:
* - 模拟患者说话:把一段患者语音(16kHz/mono/PCM16)按真实语速分块灌进去
* - 用【手动观察模式】turn_detection=null + modalities=["text"]
* - 每隔 TRIGGER_EVERY_MS 主动 commit + response.create 要一次提示
* - 打印每个文本 delta 相对"音频开始"的时间戳 → 直接看出实不实时
* - 灌音频和要提示是并行的 → 看"边说边出提示"成不成立
*
* 先决条件:
* 1. 阿里云百炼 DashScope API Key(注意:不是 Gemini key)
* export DASHSCOPE_API_KEY=sk-xxxx
* 2. ws 库: pnpm add -w ws (Node 原生 WebSocket 不支持自定义 header)
* 3. 一段患者语音文件(16kHz 单声道 PCM16 raw,或标准 WAV):
* - 录一句,或用任意 TTS 合成,例:
* "医生你好,我那颗缺了的牙一直没补,种植是不是很贵啊?大概要多久?"
* - 转成 16k mono pcm: ffmpeg -i in.mp3 -ar 16000 -ac 1 -f s16le patient.pcm
*
* 运行:
* DASHSCOPE_API_KEY=sk-xxx node scripts/qwen-omni-coach-poc.mjs patient.pcm
*
* ⚠️ 待确认(跑前核一下官方文档 https://help.aliyun.com/zh/model-studio/realtime):
* - WS_URL / MODEL 的确切取值(型号在更新)
* - 事件字段名若有出入按文档改(协议与 OpenAI Realtime 基本同构)
*/
import WebSocket from 'ws';
import { readFileSync } from 'node:fs';
// ─── 配置(跑前确认 MODEL 型号 + Key)──────────────────────────────
const API_KEY = process.env.DASHSCOPE_API_KEY;
const WS_URL = 'wss://dashscope.aliyuncs.com/api-ws/v1/realtime';
const MODEL = process.env.QWEN_OMNI_MODEL || 'qwen3-omni-flash-realtime'; // 或 qwen-omni-turbo-realtime / qwen3.5-omni-*-realtime
const TRIGGER_EVERY_MS = 3000; // 每 3s 主动要一次提示(模拟"边听边提示")
const SAMPLE_RATE = 16000; // 16kHz mono PCM16
const CHUNK_MS = 100; // 每 100ms 灌一块(模拟实时麦克风)
// 教练人设:静默旁听、只出简短"下一步话术",不跟患者对话
const COACH_INSTRUCTIONS = `你是牙科诊所客服的【静默通话教练】。你在旁听客服与患者的电话。
你【绝不】直接和患者对话,只用一句话给客服【下一步该说什么 / 该提醒的点】。
患者刚提到的关切(如价格、疼痛、时间)→ 立刻给对应的应对话术。
输出极简:一句话,不寒暄、不解释。`;
if (!API_KEY) {
console.error('❌ 缺 DASHSCOPE_API_KEY(阿里云百炼,不是 Gemini key)');
process.exit(1);
}
const audioPath = process.argv[2];
if (!audioPath) {
console.error('❌ 用法: node scripts/qwen-omni-coach-poc.mjs <patient.pcm | patient.wav>');
process.exit(1);
}
// 读音频(.wav 跳过 44 字节头;.pcm 直接 raw)
let pcm = readFileSync(audioPath);
if (audioPath.toLowerCase().endsWith('.wav')) pcm = pcm.subarray(44);
const bytesPerChunk = (SAMPLE_RATE * 2 * CHUNK_MS) / 1000; // 16bit=2byte
const totalMs = Math.round((pcm.length / (SAMPLE_RATE * 2)) * 1000);
console.log(`🎧 患者语音 ${audioPath}: ${(pcm.length / 1024).toFixed(1)}KB ≈ ${totalMs}ms,按 ${CHUNK_MS}ms/块实时灌入`);
const t0 = Date.now();
const ts = () => `+${String(Date.now() - t0).padStart(5, ' ')}ms`;
const ws = new WebSocket(`${WS_URL}?model=${MODEL}`, {
headers: { Authorization: `bearer ${API_KEY}` },
});
let firstDeltaAt = null;
ws.on('open', () => {
console.log(`${ts()} 🔌 已连接,model=${MODEL}`);
// 1) 配置会话:手动模式(关 VAD)+ 只出文字 + 教练人设
send('session.update', {
session: {
modalities: ['text'], // 只出文本,不出语音 → 静默
instructions: COACH_INSTRUCTIONS,
input_audio_format: 'pcm16',
turn_detection: null, // ⭐ 关键:关自动轮次检测 → 患者一直说不会被当打断/不会抢答
},
});
streamAudio();
});
ws.on('message', (raw) => {
let ev;
try {
ev = JSON.parse(raw.toString());
} catch {
return;
}
switch (ev.type) {
case 'session.updated':
console.log(`${ts()} ✅ 会话配置生效(turn_detection=null, modalities=[text])`);
break;
case 'response.text.delta': // 流式文本提示
if (!firstDeltaAt) {
firstDeltaAt = Date.now() - t0;
process.stdout.write(`${ts()} 🟢 [首个提示文本] `);
}
process.stdout.write(ev.delta ?? '');
break;
case 'response.text.done':
process.stdout.write(` ⟵ done(${ts()})\n`);
break;
case 'response.done':
break;
case 'error':
console.error(`${ts()} ❌ error:`, JSON.stringify(ev.error ?? ev));
break;
default:
// 其它事件(speech_started/stopped 等)按需打开看
// console.log(`${ts()} · ${ev.type}`);
break;
}
});
ws.on('close', (code) => console.log(`\n${ts()} 🔚 连接关闭 code=${code}`));
ws.on('error', (e) => console.error(`${ts()} ❌ ws error:`, e.message));
function send(type, payload = {}) {
ws.send(JSON.stringify({ type, ...payload }));
}
// 按真实语速分块灌音频;并行地每 TRIGGER_EVERY_MS 主动要一次提示
async function streamAudio() {
let triggerCount = 0;
// 定时主动要提示(边灌边要 → 验证"边说边出提示")
const timer = setInterval(() => {
triggerCount++;
console.log(`\n${ts()} 📨 [触发#${triggerCount}] commit + response.create(此刻患者还在说)`);
send('input_audio_buffer.commit'); // 提交目前缓冲的音频
send('response.create'); // 主动要一次教练提示
}, TRIGGER_EVERY_MS);
for (let off = 0; off < pcm.length; off += bytesPerChunk) {
const chunk = pcm.subarray(off, off + bytesPerChunk);
send('input_audio_buffer.append', { audio: chunk.toString('base64') });
await sleep(CHUNK_MS); // 模拟实时:按音频时长节奏灌
}
console.log(`\n${ts()} 🎤 患者语音灌完`);
clearInterval(timer);
// 最后再要一次(收尾提示)
send('input_audio_buffer.commit');
send('response.create');
setTimeout(() => {
console.log(`\n📊 结果:首个提示文本到达 = ${firstDeltaAt ?? '未收到'}ms(音频总时长 ${totalMs}ms)`);
console.log(` 若首个提示远早于音频结束,说明"边说边实时出提示"成立 `);
ws.close();
}, 4000);
}
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment