首页 > 教程攻略 > ai资讯 >实战!从 0 到 1 搭建 H5 AI 对话页面

实战!从 0 到 1 搭建 H5 AI 对话页面

来源:互联网 时间:2026-07-05 13:51:04

从零搭建一个 H5 AI 对话页面,听起来是不是有点刺激?最近刚好接了个任务,要给老板搞这么个东西。本来想偷个懒,直接用 UniApp 现成的插件快速搞定,但深入了解后发现,那些插件要么能力不够,要么接口不对,最终还是老老实实走上自主开发的道路。下面把过程中的关键技术和一些踩坑心得摊开来聊聊,希望能给同样折腾的朋友们一点参考。

实战!从 0 到 1 搭建 H5 AI 对话页面

一、攻克流式数据 SSE

第一次做 AI 对话项目,第一关就是怎么把 AI 的回复“一个字一个字”地吐出来。查了一圈才知道,这叫 SSE(Server-Sent Events),服务器可以源源不断地把数据推给客户端,对话体验瞬间就有了“实时感”。

一开始想用原生的 EventSource 接口,简单省事。但研究之后发现,这玩意儿只支持 GET 请求。可项目里需要带一堆参数 POST 过去,这条路直接断掉。没办法,只能在 Vue 生态里找插件,几经对比,最后敲定了 fetch-event-source。看代码:

const fetchAskDataFunc = (length: number, currenStr: string = currenContentStr.value) => {
    abortController = new AbortController();
    const signal = abortController.signal;
    isStreaming.value = true;
    fetchEventSource(`${import.meta.env.VITE_APP_AI_BASE_URL}/ali/ai/streamAsk`, {
        signal,
        method: "POST",
        // retryInterval: 2000,
        headers: {
            "Content-Type": "application/json",
            Accept: "text/event-stream",
            "Cache-Control": "no-cache",
            Authorization: getToken,
        },
        body: JSON.stringify({
            question: currenStr,
            sessionId: sessionId.value,
            accountUid: getToken,
        }),
        openWhenHidden: true,
        onmessage: (event) => {
            const data = JSON.parse(event.data);
            sessionId.value = data.sessionId;
            currenContentArr.value[length] = {
                type: "resutl",
                content: data.thoughts[1].response,
                text: data.text,
                finishReason: data.finishReason,
                userContent: currenStr,
                resultContentDom: "resultContent" + length,
                thinkContentDom: "thinkContent" + length,
                timeNum: timeNum.value,
                dataType: "streamAsk",
                ...data,
            };
            if (data.text) {
                isThink.value = false;
                timerObj && clearInterval(timerObj);
            }
        },
        onerror: (error) => {
            timerObj && clearInterval(timerObj);
            isThink.value = false;
            console.error("Fetch event source error:", error);
        },
        onclose() {
            timerObj && clearInterval(timerObj);
            isThink.value = false;
            isStreaming.value = false;
            // 请求完成后的收尾工作
        },
    });
};

这段代码的核心就是通过 fetchEventSource 发起 POST 请求,然后在 onmessage 里一句一句地拿数据、更新界面。onerroronclose 处理异常和结束,逻辑很清晰。

二、突破语音识别难关

一开始想直接在浏览器前端把语音转成文字,但老板是福建人,方言味儿重,前端那点识别能力怕是不太靠谱。于是决定把音频送到后端去处理。浏览器自带 na vigator.mediaDevices.getUserMedia 能拿到音频流,但试了半天,传 wa v 格式总是失败——可能是有隐藏的坑我没踩到。最后换成了 recorder-core 插件,一路顺畅。代码贴出来:

import { ref, onUnmounted } from 'vue';
import Recorder from 'recorder-core';
import 'recorder-core/src/engine/wa v';

na vigator.getUserMedia = na vigator.getUserMedia ||
  na vigator.webkitGetUserMedia ||
  na vigator.mozGetUserMedia ||
  na vigator.msGetUserMedia;

export function useRecorder() {
    const recorder = ref(null);
    const isRecording = ref(false);
    const audioBlob = ref(null);

    const requestPermission = async () => {
        try {
            if (na vigator.mediaDevices && na vigator.mediaDevices.getUserMedia) {
                const stream = await na vigator.mediaDevices.getUserMedia({ audio: true });
                recorder.value = Recorder({
                    type: 'wa v',
                    sampleRate: 16000,
                    bitRate: 16,
                    stream
                });
            } else if (na vigator.getUserMedia) {
                return new Promise((resolve, reject) => {
                    na vigator.getUserMedia({ audio: true }, (stream) => {
                        recorder.value = Recorder({
                            type: 'wa v',
                            sampleRate: 16000,
                            bitRate: 16,
                            stream
                        });
                        resolve(true);
                    }, (error) => {
                        console.error('权限请求失败:', error);
                        reject(false);
                    });
                });
            } else {
                console.error('浏览器不支持音频录制');
                return false;
            }
            await new Promise((resolve, reject) => {
                recorder.value.open(() => {
                    resolve();
                }, (error) => {
                    console.error('打开录音器失败:', error);
                    reject(error);
                });
            });
            return true;
        } catch (error) {
            console.error('权限请求失败:', error);
            return false;
        }
    };

    const startRecording = async () => {
        if (isRecording.value) return;
        const hasPermission = await requestPermission();
        if (hasPermission) {
            try {
                recorder.value.start();
                isRecording.value = true;
            } catch (error) {
                console.error('开始录音失败:', error);
            }
        }
    };

    const stopRecording = () => {
        if (!isRecording.value) return;
        isRecording.value = false;
        return recorder.value
    };

    onUnmounted(() => {
        if (recorder.value) {
            recorder.value.destroy();
            recorder.value = null;
        }
    });

    return {
        isRecording,
        audioBlob,
        requestPermission,
        startRecording,
        stopRecording,
    };
}

后来老板又提了个需求:要能取消录音,比如长按开始,上滑就取消。这必须安排。于是加了手势控制逻辑:

let timeOutEvent: any = 0;

const gtouchstart = (event) => {
    timeOutEvent = setTimeout(() => {
        longPress();
    }, 500);
    return false;
};

const gtouchstartPc = async () => {
    isVoice.value = !isVoice.value;
    if (isPcRecording.value) {
        record.startRecording();
    } else {
        stopRecording();
    }
    isPcRecording.value = !isPcRecording.value;
    return false;
};

const showDeleteButton = () => {
    clearTimeout(timeOutEvent);
    isVoice.value = false;
    stopRecording();
    return false;
};

const gtouchmove = (event) => {
    const currentX = event.touches[0].clientX;
    const currentY = event.touches[0].clientY;
    const FooterDomRect = FooterDom.value.getBoundingClientRect();
    if (
        currentX < FooterDomRect.left ||
        currentX > FooterDomRect.right ||
        currentY < FooterDomRect.top ||
        currentY > FooterDomRect.bottom
    ) {
        isCancelVoice.value = true;
    } else {
        isCancelVoice.value = false;
    }
    clearTimeout(timeOutEvent);
    timeOutEvent = 0;
};

const longPress = () => {
    timeOutEvent = 0;
    startRecording();
};

const startRecording = async () => {
    isCancelVoice.value = false;
    isVoice.value = true;
    record.startRecording();
};

const stopRecording = () => {
    const recorder = record.stopRecording();
    if (isCancelVoice.value) {
        recorder.stop(
            (blob) => {
                console.log("录音已取消");
            },
            (error) => {
                Toast.clear();
                console.error("录音停止时出错:", error);
            }
        );
        return;
    }
    Toast.loading({
        message: "正在识别",
        forbidClick: true,
        duration: 0,
    });
    try {
        recorder.stop(
            (blob) => {
                const audioBlob = blob;
                const formDataObj = new FormData();
                formDataObj.append("voice", audioBlob);
                service({
                    url: "/ali/ai/recognize",
                    method: "post",
                    data: formDataObj,
                })
                    .then((res) => {
                        if (res.data && !isPc.value) {
                            emits("pushContentFunc", res.data);
                        } else if (res.data) {
                            contentStr.value = res.data;
                            InputFocusFunc();
                        }
                        Toast.clear();
                    })
                    .finally(() => {
                        Toast.clear();
                    });
            },
            (error) => {
                Toast.clear();
                console.error("录音停止时出错:", error);
            }
        );
    } catch (error) {
        Toast.clear();
        console.error("停止录音时出现异常:", error);
    }
};

const stopSSEFunc = () => {
    emits("stopSSEFunc");
};

三、优化流式数据自动滚动与手势控制

老板没提,但自己看着腾讯元宝那流式输出的自动滚动和手势拖拽挺顺手,于是决定给项目也加上。最初的想法很简单:用 scrollTopscrollHeight 控制自动滚动,用 touchmove 监听手势,一旦用户滑动就暂停自动滚动。然而实际开发中,touchmove 有时候触发不了,导致体验断断续续。解决办法是引入 touchstarttouchend 做辅助判断,保证手势识别的稳当。最后实现了一套“智能暂停”的滚动方案:

const messagesRef = ref();
const messageRefs = ref([]);
const lastTouchY = ref(0);
const isScroStop = ref(false);
const isUp = ref(false);
let timer: any = null;

const initScrollToBottomFunc = () => {
    !isUp.value && !isScroStop.value && scrollToBottomFunc();
};

let time = 0;
let storeTime = 0;
const getTimeFunc = () => {
    timer = setInterval(() => {
        storeTime = time;
    }, 1000);
};
getTimeFunc();

watch(
    () => currenContentArr.value,
    () => {
        if (storeTime === time) {
            initScrollToBottomFunc();
        }
        storeTime++;
        if (dataType.value === 2) {
            const index = currenContentArr.value.length - 1;
            nextTick(() => {
                initChartFunc(currenContentArr.value[index].content, "chartRef" + index);
            });
        }
        if (currenContentArr.value.length == 0) {
            arrDom = [];
        }
    },
    {
        deep: true,
    }
);

const scrollToBottomFunc = (type = "") => {
    if (type === "click") {
        isScroStop.value = false;
    }
    nextTick(() => {
        const messagesContainer = messagesRef.value;
        if (messagesContainer) {
            messagesContainer.scrollTop = messagesContainer.scrollHeight;
        }
    });
};

const scrollTopFunc = async (id) => {
    // 暂时未实现,留个坑
};

const handleScrollFunc = () => {
    const element = messagesRef.value;
    if (element) {
        const scrollHeight = element.scrollHeight;
        const scrollTop = element.scrollTop;
        const clientHeight = element.clientHeight;
        if (scrollTop + clientHeight + 5 >= scrollHeight) {
            isUp.value = false;
            isScroStop.value = false;
        } else {
            if (isScroStop.value) {
                isUp.value = true;
            }
        }
    }
};

const inputContentFunc = () => {
    isScroStop.value = true;
};

defineExpose({ scrollTopFunc, inputContentFunc });

const handleScrollTopFunc = (event) => {
    if (event.deltaY < 0) {
        isScroStop.value = true;
    }
};

const handleTouchMoveFunc = (event) => {
    const messagesContainer = messagesRef.value;
    if (!messagesContainer) return;
    const currentTouchY = event.touches[0].clientY;
    if (currentTouchY > 0 && messagesContainer.scrollTop > 0) {
        isScroStop.value = true;
    }
    lastTouchY.value = event.touches[0].clientY;
};

const startX = ref(0);
const startY = ref(0);
const threshold = 10;

const handleTouchStart = (event: TouchEvent) => {
    isScroStop.value = true;
    const touch = event.touches[0];
    startX.value = touch.clientX;
    startY.value = touch.clientY;
};

const handleTouchEnd = (event: TouchEvent) => {
    const touch = event.changedTouches[0];
    const endX = touch.clientX;
    const endY = touch.clientY;
    const deltaX = endX - startX.value;
    const deltaY = endY - startY.value;
    const isSliding = Math.abs(deltaX) > threshold || Math.abs(deltaY) > threshold;
    if (isSliding) {
        if (Math.abs(deltaX) > Math.abs(deltaY)) {
            // 横向滑动不做处理
        } else {
            isScroStop.value = true;
        }
    } else {
        isScroStop.value = false;
    }
};

const initFunc = () => {
    const element = messagesRef.value;
    if (element) {
        element.addEventListener("scroll", handleScrollFunc);
    }
};

到这里,一个能跑、能说、能自动滚的 AI 对话页面基本成型。当然,后面还有不少细节要打磨,比如 SSE 返回流数据识别和 Echart 图显示的问题,打算另起一篇接着聊。

相关下载