百川模型第二波，魔搭最佳实践教程来了！

导读今天，百川智能宣布正式发布并开源Baichuan 2！开源包括Baichuan 2-7B、Baichuan 2-13B、Baichuan 2-13B-Chat与其4bit量化版本，并且均为免费可商用。 Baichuan 2 是百川智能推出的新一代开源大语言模型，采用 2.6 万亿 Tokens 的高质量语料训练，在权威的中文和英文 benchmark上均取得同尺寸最好的效果。 Baic

魔搭ModelScope社区

224人浏览 · 2023-09-07 13:52:11

魔搭ModelScope社区 · 2023-09-07 13:52:11 发布

导读

今天，百川智能宣布正式发布并开源Baichuan 2！开源包括Baichuan 2-7B、Baichuan 2-13B、Baichuan 2-13B-Chat与其4bit量化版本，并且均为免费可商用。

Baichuan 2 是百川智能推出的新一代开源大语言模型，采用 2.6 万亿 Tokens 的高质量语料训练，在权威的中文和英文 benchmark上均取得同尺寸最好的效果。

Baichuan 2 目前在魔搭社区已全面开源上线，大家可以体验起来啦！小编特将最新鲜的魔搭体验、推理最佳实践奉上。

环境配置与安装

python 3.8及以上版本
pytorch 1.12及以上版本，推荐2.0及以上版本
建议使用CUDA 11.4及以上

使用步骤

本文主要演示的模型为Baichuan2-7B-Chat和Baichuan2-7B-Base模型，在ModelScope的Notebook的环境（这里以PAI-DSW为例）的配置下运行（显存24G）：

服务器连接与环境准备

1、进入ModelScope首页：modelscope.cn，进入我的Notebook

2、选择GPU环境，进入PAI-DSW在线开发环境

3、新建Notebook

创空间体验

创空间描述：

据官方公布的基准测试数据，Baichuan2-13B相比上一代13B模型，在数学能力（↑49%）、代码能力（↑46%）、安全能力（↑37%）、逻辑推理能力（↑25%）、语义理解能力（↑15%）上均有显著提升。

魔搭社区上线了 Baichuan2-13B-Chat的体验Demo（Baichuan2-13B-Chat为Baichuan2-13B系列模型中对齐后的版本），欢迎大家体验实际效果！

创空间链接：

https://modelscope.cn/studios/baichuan-inc/Baichuan-13B-Chatdemo/summary

晒出一些小编基于各维度随机抽问的一次性测试案例：

国际惯例先上自我认知

数学

编程

安全

逻辑推理

语义理解

模型链接及下载

Baichuan2系列模型现已在ModelScope社区开源，包括：

百川2-7B-预训练模型：

https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Base/summary

百川2-7B-对话模型：

https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat/summary

百川2-7B-对话模型-int4量化版：

https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat-int4/summary

百川2-13B-预训练模型：

https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Base/summary

百川2-13B-对话模型：

https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Chat/summary

百川2-13B-对话模型-4bits量化版：

https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Chat-4bits/summary

百川2-7B-训练过程模型：

https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Intermediate-Checkpoints/summary

社区支持直接下载模型的repo：

from modelscope.hub.snapshot_download import snapshot_download
model_dir = snapshot_download('baichuan-inc/Baichuan2-7B-Chat', 'v1.0.0')

模型推理

推理代码：

import torch
from modelscope import (
    AutoModelForCausalLM, AutoTokenizer, GenerationConfig, snapshot_download
)

model_id = 'baichuan-inc/Baichuan2-7B-Chat'
revision = 'v1.0.0'

model_dir = snapshot_download(model_id, revision=revision)

tokenizer = AutoTokenizer.from_pretrained(model_dir, use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", 
                                             torch_dtype=torch.bfloat16, 
                                             trust_remote_code=True)
model.generation_config = GenerationConfig.from_pretrained(model_dir)

messages = []
messages.append({"role": "user", "content": "世界第一高峰是哪个"})
response = model.chat(tokenizer, messages)
print(response)

资源消耗：

模型微调和微调后推理

微调代码开源地址:

https://github.com/modelscope/swift/blob/main/examples/pytorch/llm

clone swift仓库并安装swift

git clone https://github.com/modelscope/swift.git
cd swift
pip install .
cd examples/pytorch/llm

模型微调脚本 (lora_ddp)

# 4 * 22GB VRAM
nproc_per_node=4
CUDA_VISIBLE_DEVICES=0,1,2,3 \
torchrun \
    --nproc_per_node=$nproc_per_node \
    --master_port 29500 \
    src/llm_sft.py \
    --model_type baichuan2-7b-chat \
    --sft_type lora \
    --template_type baichuan \
    --dtype bf16 \
    --output_dir runs \
    --ddp_backend nccl \
    --dataset alpaca-en,alpaca-zh \
    --dataset_sample 20000 \
    --num_train_epochs 1 \
    --max_length 1024 \
    --lora_rank 8 \
    --lora_alpha 32 \
    --lora_dropout_p 0.05 \
    --lora_target_modules W_pack o_proj \
    --gradient_checkpointing true \
    --batch_size 1 \
    --weight_decay 0. \
    --learning_rate 1e-4 \
    --gradient_accumulation_steps $(expr 16 / $nproc_per_node) \
    --max_grad_norm 0.5 \
    --warmup_ratio 0.03 \
    --eval_steps 50 \
    --save_steps 50 \
    --save_total_limit 2 \
    --logging_steps 10 \
    --push_to_hub false \
    --hub_model_id baichuan2-7b-chat-lora \
    --hub_private_repo true \
    --hub_token 'your-sdk-token' \

模型微调后的推理脚本

# 16G
CUDA_VISIBLE_DEVICES=0 \
python src/llm_infer.py \
    --model_type baichuan2-7b-chat \
    --sft_type lora \
    --template_type baichuan \
    --dtype bf16 \
    --ckpt_dir "runs/baichuan2-7b-chat/vx_xxx/checkpoint-xxx" \
    --eval_human true \
    --max_new_tokens 1024 \
    --temperature 0.9 \
    --top_k 50 \
    --top_p 0.9 \
    --do_sample true \

微调的可视化结果

训练损失:

评估损失

资源消耗：4 22G*

点击链接直达Baichuan2-13B-Chat创空间体验

https://modelscope.cn/studios/baichuan-inc/Baichuan-13B-Chatdemo/summary

欢迎加入ModelScope魔搭中文开源社区

ModelScope旨在打造下一代开源的模型即服务共享平台，为泛AI开发者提供灵活、易用、低成本的一站式模型服务产品，让模型应用更简单！

更多推荐

[论文笔记]Mistral 7B

ModelScope魔搭社区

有趣的大模型之我见 | Mistral 7B 和 Mixtral 8x7B

我们知道模型各层中的多头自注意力机制即 multi-headead self attention，其实模型各层中还有另外一个组件“前馈网络” 即（Feedforward Neural Network，FFN)。FFN 的作用是对数据进行额外变换,提取更细腻的模式规律,从而提升模型学习和理解语言语义的能力。每个自注意力头脑都试图学习输入序列不同词关系的一些特征信息。如果我们在 FNN 部分引入多个网

ModelScope魔搭社区

Mistral 7B

结果显示 Mistral 7B 在所有指标上都显著优于 Llama 2 13B，并且与 Llama 34B 相当（由于 Llama 2 34B 尚未发布，因而只报告了 Llama 34B 的结果）。Mistral 7B 在所有评估中都大大优于 Llama 2 13B，除了在知识基准方面，它们处于同等水平。近日，一家法国人工智能初创公司 Mistral AI 发布了一款新模型 Mistral 7B，