您的位置: 首页 > 技术杂谈 > 正文

本地运行 Google 最新开源的 Gemma 系列模型，只需5分钟

2024-02-23 13:00 https://my.oschina.net/u/4532842/blog/11044495 WasmEdge 次阅读条评论

昨天，Google 宣布开源 Gemma 模型，其中包含了 Gemma-2b-it 与 Gemma-7b-it，加入开源 LLM 大家庭。

Google Gemma 模型系列专为一系列文本生成任务而设计，例如问答、摘要和推理。这些轻量级、最先进的模型采用与 Gemini 模型相同的技术构建，提供文本生文本、仅限 decoder 的功能。 Gemma 系列是英文大模型，有开放权重、预训练变体和指令调整版本，使其适合在资源有限的环境中部署。根据谷歌的文章，Gemma-7b 比同样参数的 Llama-2 7B性能更好，甚至超越了 Llama-2 13B。

本文中，以 Gemma-7b-it 为例，我们将介绍一下内容。你可以参考本文，更改命令行上的模型名称来运行 Gemma-2b-it。

如何在自己的设备上运行 Gemma-7b-it
如何为 Gemma-7b-it 创建兼容OpenAI的API服务

我们将使用 LlamaEdge（Rust + Wasm技术栈）来为该模型开发和部署应用程序。无需安装复杂的 Python 包或 C++ 工具链！了解我们为何选择此技术栈。

在自己的设备上运行 Gemma-7b-it

步骤1：通过以下命令行安装WasmEdge。

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugin wasi_nn-ggml

步骤2：下载 Gemma-7b-it 模型 GGUF 文件。模型大小为 5.88G ，下载可能需要一定时间。内存有限的话，也可以使用 Gemma-2b-it，Q5_K_M 的大小为 1.36G，并且性能相当不错。

curl -LO https://huggingface.co/second-state/Gemma-7b-it-GGUF/resolve/main/gemma-7b-it-Q5_K_M.gguf

步骤3：下载一个跨平台的可移植 Wasm 文件，用于聊天应用。该应用让你能在命令行中与模型聊天。该应用的 Rust 源代码在这里。

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-chat.wasm

就是这样。可以通过输入以下命令在终端与模型聊天。

wasmedge --dir .:. --nn-preload default:GGML:AUTO:gemma-7b-it-Q5_K_M.gguf llama-chat.wasm -p gemma-instruct -c 4096

这个可移植的 Wasm 应用会自动调用设备上的硬件加速器（例如GPU加速）。

[You]:
Create JSON for the following: There are 3 people, two males, One is named Mark. Another is named Joe. And a third person, who is a woman, is named Sam. The women is age 30 and the two men are both 19.

[Bot]:

{
  "people": [
    {
      "name": "Mark",
      "age": 19
    },
    {
      "name": "Joe",
      "age": 19
    },
    {
      "name": "Sam",
      "age": 30
    }
  ]
}

为 Gemma-7b-it 创建兼容 OpenAI 的 API 服务

一个兼容 OpenAI 的 Web API 能让模型与不同的 LLM 工具和代理框架（如 flows.network、LangChain 和 LlamaIndex）一起工作。

下载一个 API 服务器应用。它也是一个跨平台的可移植 Wasm 应用，可以在不同 CPU、GPU 等设备运行。

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm

然后，下载聊天机器人 Web UI，从而通过聊天机器人 UI 与模型进行交互。

curl -LO https://github.com/LlamaEdge/chatbot-ui/releases/latest/download/chatbot-ui.tar.gz
tar xzf chatbot-ui.tar.gz
rm chatbot-ui.tar.gz

接下来，使用以下命令行启动模型的 API 服务器。然后，打开浏览器访问 http://localhost:8080 开始聊天！

wasmedge --dir .:. --nn-preload default:GGML:AUTO:gemma-7b-it-Q5_K_M.gguf llama-chat.wasm -p gemma-instruct -c 4096

从另一个终端，你可以使用 curl 与 API 服务器进行交互。

curl -X POST http://localhost:8080/v1/chat/completions \
  -H 'accept:application/json' \
  -H 'Content-Type: application/json' \
  -d '{"messages":[{"role":"system", "content": "You are a sentient, superintelligent artificial general intelligence, here to teach and assist me."}, {"role":"user", "content": "Write a short story about Goku discovering kirby has teamed up with Majin Buu to destroy the world."}], "model":"Gemma-7b-it"}'

就是这样啦。WasmEdge 是运行 LLM 应用最简单、最快、最安全的方式。试试看吧！