DeepSeek+nodejs本地部署提供api实操

一、背景

最近DeepSeek火出了圈，正好公司内部需要一个分享会，于是我花了两个小时在本地部署了一套大语言模型，核心是使用ollama平台引入DeepSeek-r1模型数据，后端使用Nodejs提供api服务，前端用html写了一个静态页面用于展示，以下是分享会的提纲。

二. 为什么要本地部署

安全合规：金融/医疗等敏感行业的数据不出本地
成本控制：避免云服务按调用次数计费（对比：GPT-3 每千 token $0.02）
延迟优化：实测本地 API 响应时间 <500ms（对比云端 1.2s+）
定制扩展：支持 LoRA 微调、自定义知识库注入等深度定制

三、效果演示

deepseek本地化部署.gif

四、操作步骤

1. 下载安装 ollama 环境

ollama: 专为在本地机器上部署和运行大型语言模型而设计的开源框架。
a. 通过客户端下载: https://ollama.com （windows 系统）
b. 通过命令行下载：curl -fsSL https://ollama.com/install.sh | sh （Linux 系统）
（国内下载ollama速度慢，可以使用镜像下载地址）

# Linux
export OLLAMA_MIRROR="https://ghproxy.cn/https://github.com/ollama/ollama/releases/latest/download"
curl -fsSL https://ollama.com/install.sh | sed "s|https://ollama.com/download|$OLLAMA_MIRROR|g" | sh

启动 ollama 服务：运行客户端或执行 ollama serve

2. 下载大语言模型

选择 deepseek-r1:1.5b模型（因为这个只有1.1Gb，体积小巧，方便测试）。

下载命令： ollama pull deepseek-r1:1.5b

下载+执行命令：ollama run deepseek-r1:1.5b

3. 部署 API 服务(Nodejs)

//核心代码（返回流式文本）
import ollama from 'ollama'
router.get("/api/chat", async (req, res, next) => {
  const { input } = req.query;

  // 设置响应头以允许流式传输
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');

  try {
    // 调用 ollama.chat 方法，开启流式响应
    const response = await ollama.chat({
      model: "deepseek-r1:1.5b", // 替换为你的模型名称
      messages: [{ role: "user", content: input }],
      stream: true, // 开启流式响应
    });

    if (!response || typeof response[Symbol.asyncIterator] !== 'function') {
      throw new Error('Response is not an AsyncIterator');
    }

    for await (const chunk of response) {
      res.write(`data: ${JSON.stringify(chunk)}\n\n`);
    }

    res.end();
  } catch (error) {
    res.write(`data: ${JSON.stringify({ error: error.message })}\n\n`);
    res.end();
  }
});

4. 前端界面搭建

// 核心代码，接收后端text/event-stream格式文本
 const chat = (input) => {
            const url = `http://localhost:9999/api/chat?input=${input}`;
            const eventSource = new EventSource(url);
            this.tempMsg = ''
            eventSource.onmessage = (event) => {
                try {
                    const parsedData = JSON.parse(event.data);
                    if (parsedData.message && parsedData.message.content) {
                        this.tempMsg += parsedData.message.content;
                        this.updateChat('ai', this.tempMsg);
                    }
                    if (parsedData.done) {
                        eventSource.close();
                    }
                } catch (error) {
                    console.error('Error parsing JSON:', error);
                }
            };

            eventSource.onerror = () => {
                eventSource.close();
            };
        }

四、注意点

下载模型慢(断连后重试、镜像站、docker部署)
保持 ollama、Nodejs 任务运行(pm2)
端口号被占用（客户端和命令行端口冲突）
流式文本需要使用get方式
内存要达到模型要求，不然跑不起来。

内存不足提示.jpg

Error: 500 Internal Server Error: model requires more system memory (6.6 GiB) than is available (3.1 GiB)

（鉴于我服务器硬件性能，就不部署体验地址了，跑起来也比较简单，大家可以在本地试试）