实战!从 0 到 1 搭建 H5 AI 对话页面
从零搭建一个 H5 AI 对话页面,听起来是不是有点刺激?最近刚好接了个任务,要给老板搞这么个东西。本来想偷个懒,直接用 UniApp 现成的插件快速搞定,但深入了解后发现,那些插件要么能力不够,要么接口不对,最终还是老老实实走上自主开发的道路。下面把过程中的关键技术和一些踩坑心得摊开来聊聊,希望能给同样折腾的朋友们一点参考。

一、攻克流式数据 SSE
第一次做 AI 对话项目,第一关就是怎么把 AI 的回复“一个字一个字”地吐出来。查了一圈才知道,这叫 SSE(Server-Sent Events),服务器可以源源不断地把数据推给客户端,对话体验瞬间就有了“实时感”。
一开始想用原生的 EventSource 接口,简单省事。但研究之后发现,这玩意儿只支持 GET 请求。可项目里需要带一堆参数 POST 过去,这条路直接断掉。没办法,只能在 Vue 生态里找插件,几经对比,最后敲定了 fetch-event-source。看代码:
const fetchAskDataFunc = (length: number, currenStr: string = currenContentStr.value) => {
abortController = new AbortController();
const signal = abortController.signal;
isStreaming.value = true;
fetchEventSource(`${import.meta.env.VITE_APP_AI_BASE_URL}/ali/ai/streamAsk`, {
signal,
method: "POST",
// retryInterval: 2000,
headers: {
"Content-Type": "application/json",
Accept: "text/event-stream",
"Cache-Control": "no-cache",
Authorization: getToken,
},
body: JSON.stringify({
question: currenStr,
sessionId: sessionId.value,
accountUid: getToken,
}),
openWhenHidden: true,
onmessage: (event) => {
const data = JSON.parse(event.data);
sessionId.value = data.sessionId;
currenContentArr.value[length] = {
type: "resutl",
content: data.thoughts[1].response,
text: data.text,
finishReason: data.finishReason,
userContent: currenStr,
resultContentDom: "resultContent" + length,
thinkContentDom: "thinkContent" + length,
timeNum: timeNum.value,
dataType: "streamAsk",
...data,
};
if (data.text) {
isThink.value = false;
timerObj && clearInterval(timerObj);
}
},
onerror: (error) => {
timerObj && clearInterval(timerObj);
isThink.value = false;
console.error("Fetch event source error:", error);
},
onclose() {
timerObj && clearInterval(timerObj);
isThink.value = false;
isStreaming.value = false;
// 请求完成后的收尾工作
},
});
};
这段代码的核心就是通过 fetchEventSource 发起 POST 请求,然后在 onmessage 里一句一句地拿数据、更新界面。onerror 和 onclose 处理异常和结束,逻辑很清晰。
二、突破语音识别难关
一开始想直接在浏览器前端把语音转成文字,但老板是福建人,方言味儿重,前端那点识别能力怕是不太靠谱。于是决定把音频送到后端去处理。浏览器自带 na vigator.mediaDevices.getUserMedia 能拿到音频流,但试了半天,传 wa v 格式总是失败——可能是有隐藏的坑我没踩到。最后换成了 recorder-core 插件,一路顺畅。代码贴出来:
import { ref, onUnmounted } from 'vue';
import Recorder from 'recorder-core';
import 'recorder-core/src/engine/wa v';
na vigator.getUserMedia = na vigator.getUserMedia ||
na vigator.webkitGetUserMedia ||
na vigator.mozGetUserMedia ||
na vigator.msGetUserMedia;
export function useRecorder() {
const recorder = ref(null);
const isRecording = ref(false);
const audioBlob = ref(null);
const requestPermission = async () => {
try {
if (na vigator.mediaDevices && na vigator.mediaDevices.getUserMedia) {
const stream = await na vigator.mediaDevices.getUserMedia({ audio: true });
recorder.value = Recorder({
type: 'wa v',
sampleRate: 16000,
bitRate: 16,
stream
});
} else if (na vigator.getUserMedia) {
return new Promise((resolve, reject) => {
na vigator.getUserMedia({ audio: true }, (stream) => {
recorder.value = Recorder({
type: 'wa v',
sampleRate: 16000,
bitRate: 16,
stream
});
resolve(true);
}, (error) => {
console.error('权限请求失败:', error);
reject(false);
});
});
} else {
console.error('浏览器不支持音频录制');
return false;
}
await new Promise((resolve, reject) => {
recorder.value.open(() => {
resolve();
}, (error) => {
console.error('打开录音器失败:', error);
reject(error);
});
});
return true;
} catch (error) {
console.error('权限请求失败:', error);
return false;
}
};
const startRecording = async () => {
if (isRecording.value) return;
const hasPermission = await requestPermission();
if (hasPermission) {
try {
recorder.value.start();
isRecording.value = true;
} catch (error) {
console.error('开始录音失败:', error);
}
}
};
const stopRecording = () => {
if (!isRecording.value) return;
isRecording.value = false;
return recorder.value
};
onUnmounted(() => {
if (recorder.value) {
recorder.value.destroy();
recorder.value = null;
}
});
return {
isRecording,
audioBlob,
requestPermission,
startRecording,
stopRecording,
};
}
后来老板又提了个需求:要能取消录音,比如长按开始,上滑就取消。这必须安排。于是加了手势控制逻辑:
let timeOutEvent: any = 0;
const gtouchstart = (event) => {
timeOutEvent = setTimeout(() => {
longPress();
}, 500);
return false;
};
const gtouchstartPc = async () => {
isVoice.value = !isVoice.value;
if (isPcRecording.value) {
record.startRecording();
} else {
stopRecording();
}
isPcRecording.value = !isPcRecording.value;
return false;
};
const showDeleteButton = () => {
clearTimeout(timeOutEvent);
isVoice.value = false;
stopRecording();
return false;
};
const gtouchmove = (event) => {
const currentX = event.touches[0].clientX;
const currentY = event.touches[0].clientY;
const FooterDomRect = FooterDom.value.getBoundingClientRect();
if (
currentX < FooterDomRect.left ||
currentX > FooterDomRect.right ||
currentY < FooterDomRect.top ||
currentY > FooterDomRect.bottom
) {
isCancelVoice.value = true;
} else {
isCancelVoice.value = false;
}
clearTimeout(timeOutEvent);
timeOutEvent = 0;
};
const longPress = () => {
timeOutEvent = 0;
startRecording();
};
const startRecording = async () => {
isCancelVoice.value = false;
isVoice.value = true;
record.startRecording();
};
const stopRecording = () => {
const recorder = record.stopRecording();
if (isCancelVoice.value) {
recorder.stop(
(blob) => {
console.log("录音已取消");
},
(error) => {
Toast.clear();
console.error("录音停止时出错:", error);
}
);
return;
}
Toast.loading({
message: "正在识别",
forbidClick: true,
duration: 0,
});
try {
recorder.stop(
(blob) => {
const audioBlob = blob;
const formDataObj = new FormData();
formDataObj.append("voice", audioBlob);
service({
url: "/ali/ai/recognize",
method: "post",
data: formDataObj,
})
.then((res) => {
if (res.data && !isPc.value) {
emits("pushContentFunc", res.data);
} else if (res.data) {
contentStr.value = res.data;
InputFocusFunc();
}
Toast.clear();
})
.finally(() => {
Toast.clear();
});
},
(error) => {
Toast.clear();
console.error("录音停止时出错:", error);
}
);
} catch (error) {
Toast.clear();
console.error("停止录音时出现异常:", error);
}
};
const stopSSEFunc = () => {
emits("stopSSEFunc");
};
三、优化流式数据自动滚动与手势控制
老板没提,但自己看着腾讯元宝那流式输出的自动滚动和手势拖拽挺顺手,于是决定给项目也加上。最初的想法很简单:用 scrollTop 和 scrollHeight 控制自动滚动,用 touchmove 监听手势,一旦用户滑动就暂停自动滚动。然而实际开发中,touchmove 有时候触发不了,导致体验断断续续。解决办法是引入 touchstart 和 touchend 做辅助判断,保证手势识别的稳当。最后实现了一套“智能暂停”的滚动方案:
const messagesRef = ref();
const messageRefs = ref([]);
const lastTouchY = ref(0);
const isScroStop = ref(false);
const isUp = ref(false);
let timer: any = null;
const initScrollToBottomFunc = () => {
!isUp.value && !isScroStop.value && scrollToBottomFunc();
};
let time = 0;
let storeTime = 0;
const getTimeFunc = () => {
timer = setInterval(() => {
storeTime = time;
}, 1000);
};
getTimeFunc();
watch(
() => currenContentArr.value,
() => {
if (storeTime === time) {
initScrollToBottomFunc();
}
storeTime++;
if (dataType.value === 2) {
const index = currenContentArr.value.length - 1;
nextTick(() => {
initChartFunc(currenContentArr.value[index].content, "chartRef" + index);
});
}
if (currenContentArr.value.length == 0) {
arrDom = [];
}
},
{
deep: true,
}
);
const scrollToBottomFunc = (type = "") => {
if (type === "click") {
isScroStop.value = false;
}
nextTick(() => {
const messagesContainer = messagesRef.value;
if (messagesContainer) {
messagesContainer.scrollTop = messagesContainer.scrollHeight;
}
});
};
const scrollTopFunc = async (id) => {
// 暂时未实现,留个坑
};
const handleScrollFunc = () => {
const element = messagesRef.value;
if (element) {
const scrollHeight = element.scrollHeight;
const scrollTop = element.scrollTop;
const clientHeight = element.clientHeight;
if (scrollTop + clientHeight + 5 >= scrollHeight) {
isUp.value = false;
isScroStop.value = false;
} else {
if (isScroStop.value) {
isUp.value = true;
}
}
}
};
const inputContentFunc = () => {
isScroStop.value = true;
};
defineExpose({ scrollTopFunc, inputContentFunc });
const handleScrollTopFunc = (event) => {
if (event.deltaY < 0) {
isScroStop.value = true;
}
};
const handleTouchMoveFunc = (event) => {
const messagesContainer = messagesRef.value;
if (!messagesContainer) return;
const currentTouchY = event.touches[0].clientY;
if (currentTouchY > 0 && messagesContainer.scrollTop > 0) {
isScroStop.value = true;
}
lastTouchY.value = event.touches[0].clientY;
};
const startX = ref(0);
const startY = ref(0);
const threshold = 10;
const handleTouchStart = (event: TouchEvent) => {
isScroStop.value = true;
const touch = event.touches[0];
startX.value = touch.clientX;
startY.value = touch.clientY;
};
const handleTouchEnd = (event: TouchEvent) => {
const touch = event.changedTouches[0];
const endX = touch.clientX;
const endY = touch.clientY;
const deltaX = endX - startX.value;
const deltaY = endY - startY.value;
const isSliding = Math.abs(deltaX) > threshold || Math.abs(deltaY) > threshold;
if (isSliding) {
if (Math.abs(deltaX) > Math.abs(deltaY)) {
// 横向滑动不做处理
} else {
isScroStop.value = true;
}
} else {
isScroStop.value = false;
}
};
const initFunc = () => {
const element = messagesRef.value;
if (element) {
element.addEventListener("scroll", handleScrollFunc);
}
};
到这里,一个能跑、能说、能自动滚的 AI 对话页面基本成型。当然,后面还有不少细节要打磨,比如 SSE 返回流数据识别和 Echart 图显示的问题,打算另起一篇接着聊。