MiniCPM模型的简单部署

原创

已于 2024-02-20 14:55:55 修改 · 5.2k 阅读

文章标签：

#自然语言处理 #深度学习

于 2024-02-20 14:54:06 首次发布

本文介绍了MiniCPM模型的简单部署，包括环境配置、模型下载和使用过程，特别提到可以直接调用或借助Gradio构建UI界面进行交互。MiniCPM是由面壁智能与清华大学自然语言处理实验室开源的轻量级大模型，拥有高性能和移动端部署能力。

提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档

文章目录

前言
总结
一、MiniCPM
二、部署过程
- 1.配置环境
- 2. 模型下载
3.使用过程
- 3.1 直接调用
- 3.2 借助Gradio构造UI界面进行调用

前言

MiniCPM 是面壁与清华大学自然语言处理实验室共同开源的系列端侧语言大模型，主体语言模型 MiniCPM-2B 仅有 24亿（2.4B）的非词嵌入参数量。直接在本地运行，方便我们进一步探究该模型，接下来简单介绍一下该模型的本地部署工作

总结

在本地部署中出现的错误和解决方案：

pip install flash_attn安装错误

  model_class = get_class_from_dynamic_module(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\888\anaconda3\envs\pytorch_2.1.1_llm\Lib\site-packages\transformers\dynamic_module_utils.py", line 488, in get_class_from_dynamic_module
  final_module = get_cached_module_file(
                 ^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\888\anaconda3\envs\pytorch_2.1.1_llm\Lib\site-packages\transformers\dynamic_module_utils.py", line 315, in get_cached_module_file
  modules_needed = check_imports(resolved_module_file)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\888\anaconda3\envs\pytorch_2.1.1_llm\Lib\site-packages\transformers\dynamic_module_utils.py", line 180, in check_imports
  raise ImportError(ImportError: This modeling file requires the following packages that were not found in your environment: configuration_minicpm. Run pip install configuration_minicpm 
  	```	
  	解决方案：
  		# 模型所在目录的路径
  	model_dir = './miniCPM-bf16'
  	
  	将模型目录添加到 sys.path
  	if model_dir not in sys.path:
  	    sys.path.append(model_dir)
  	    ``

一、MiniCPM

MiniCPM 是面壁智能与清华大学自然语言处理实验室共同开源的系列端侧大模型，主体语言模型 MiniCPM-2B 仅有 24亿（2.4B）的非词嵌入参数量, 总计2.7B参数量。

经过 SFT 后，MiniCPM 在公开综合性评测集上，MiniCPM 与 Mistral-7B相近（中文、数学、代码能力更优），整体性能超越 Llama2-13B、MPT-30B、Falcon-40B 等模型。
经过 DPO 后，MiniCPM 在当前最接近用户体感的评测集 MTBench上，MiniCPM-2B 也超越了 Llama2-70B-Chat、Vicuna-33B、Mistral-7B-Instruct-v0.1、Zephyr-7B-alpha 等众多代表性开源大模型。
以 MiniCPM-2B 为基础构建端侧多模态大模型 MiniCPM-V，整体性能在同规模模型中实现最佳，超越基于 Phi-2 构建的现有多模态大模型，在部分评测集上达到与 9.6B Qwen-VL-Chat 相当甚至更好的性能。
经过 Int4 量化后，MiniCPM 可在手机上进行部署推理，流式输出速度略高于人类说话速度。MiniCPM-V 也直接跑通了多模态大模型在手机上的部署。
一张1080/2080可高效参数微调，一张3090/4090可全参数微调，一台机器可持续训练 MiniCPM，二次开发成本较低。
我们完全开源MiniCPM-2B的模型参数供学术研究和有限商用，在未来我们还将发布训练过程中的所有Checkpoint和大部分非专有数据供模型机理研究。具体而言，我们目前已公开以下模型，地址详见模型下载部分
基于MiniCPM-2B的指令微调与人类偏好对MiniCPM-2B-SFT/DPO。
基于MiniCPM-2B的多模态模型MiniCPM-V，能力超越基于Phi-2的同参数级别多模态模型。
MiniCPM-2B-SFT/DPO的Int4量化版MiniCPM-2B-SFT/DPO-Int4。
基于MLC-LLM、LLMFarm开发的MiniCPM手机端程序，文本及多模态模型均可在手机端进行推理。

二、部署过程

1.配置环境

本人使用：pytorch2.2.1-python3.11.5-cuda11.8

如果直接使用pip install flash_attn出现报错，可考虑直接使用本地安装的方式

2. 模型下载

模型下载链接
根据自身要求下载

例如从ModelScope中下载模型：

3.使用过程

3.1 直接调用

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
torch.manual_seed(0)
import sys

# 模型所在目录的路径
model_dir = './miniCPM-bf16'

# 将模型目录添加到 sys.path
if model_dir not in sys.path:
    sys.path.append(model_dir)
# 指向本地模型的路径
path = './miniCPM-bf16'
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)

responds, history = model.chat(tokenizer, "山东省最高的山是哪座山, 它比黄山高还是矮？差距多少？", temperature=0.7, top_p=0.8)
print(responds)

3.2 借助Gradio构造UI界面进行调用

from typing import Dict
from typing import List
from typing import Tuple

import argparse
import gradio as gr
import torch
from threading import Thread
import sys
from transformers import (
    AutoModelForCausalLM, 
    AutoTokenizer,
    TextIteratorStreamer
)
import warnings
warnings.filterwarnings('ignore', category=UserWarning, message='TypedStorage is deprecated')
# 模型所在目录的路径
model_dir = './miniCPM-bf16'

# 将模型目录添加到 sys.path
if model_dir not in sys.path:
    sys.path.append(model_dir)
parser = argparse.ArgumentParser()
parser.add_argument("--model_path", type=str, default="./miniCPM-bf16")
parser.add_argument("--torch_dtype", type=str, default="bfloat16", choices=["float32", "bfloat16"])
parser.add_argument("--server_name", type=str, default="127.0.0.1")
parser.add_argument("--server_port", type=int, default=7860)
args = parser.parse_args()

# init model torch dtype
torch_dtype = args.torch_dtype
if torch_dtype =="" or torch_dtype == "bfloat16":
    torch_dtype = torch.bfloat16
elif torch_dtype == "float32":
    torch_dtype = torch.float32
else:
    raise ValueError(f"Invalid torch dtype: {torch_dtype}")