9月20日，上海人工智能实验室等机构发布书生·浦语大模型（InternLM）200亿参数版本InternLM-20B，并在阿里云魔搭社区（ModelScope）开源首发、免费商用。书生·浦语大模型体系与魔搭社区建立重磅生态合作，共同推动中国大模型生态建设！

上海人工智能实验室是我国人工智能领域新型科研机构，开展战略性、原创性、前瞻性的科学研究与技术攻关，目标建成国际一流的人工智能实验室，成为享誉全球的人工智能原创理论和技术的策源地。

围绕此次InternLM-20B开源首发，魔搭官网开设了书生·浦语“模型品牌馆”专页，聚合书生系列所有模型及体验接口，便于开发者一站式查询、下载、使用书生模型，并第一时间提供最新鲜的模型部署、推理和微调最佳实践教程。欢迎开发者小伙伴们体验！

模型链接和下载

书生InternLM系列模型现已在魔搭ModelScope社区开源，包括：

书生·浦语-20B：https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b

书生·浦语-对话-20B：https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-20b

书生·浦语-对话-7B：https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b

书生·浦语-7B：https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-7b

书生·浦语-对话-7B-8K：https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-8k

书生·浦语-对话-7B-v1_1：https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-v1_1

社区支持直接下载模型的repo：

from modelscope.hub.snapshot_download import snapshot_download
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-chat-20b', 'v1.0.0')

模型推理

推理代码：


import torch
from modelscope import AutoModelForCausalLM, AutoTokenizer, snapshot_download

model_id = 'Shanghai_AI_Laboratory/internlm-20b-chat'
model_dir = snapshot_download(model_id, revision='v1.0.0')
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", torch_dtype=torch.float16, 
                                            trust_remote_code=True).eval()

query = '浙江的省会在哪里?'
response, history = model.chat(tokenizer, query, max_new_tokens=200)
print(response)
query = '这个地方有什么好吃的.'
response, history = model.chat(tokenizer, query, history, max_new_tokens=200)
print(response)

流式推理：


import torch
from modelscope import AutoModelForCausalLM, AutoTokenizer, snapshot_download
model_id = 'Shanghai_AI_Laboratory/internlm-20b-chat'
model_dir = snapshot_download(model_id, revision='v1.0.0')
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", torch_dtype=torch.bfloat16,
                                             trust_remote_code=True).eval()

history = []
while True:
    query = input('<<< ')
    if query == 'clear':
        # clear history
        history = []
        continue

    # inference
    chat_generator = model.stream_chat(tokenizer, query, history, max_new_tokens=512)
    print_idx = 0
    for response, history in chat_generator:
        print(response[print_idx:], end='', flush=True)
        print_idx = len(response)
    print()

资源消耗：

模型微调和微调后推理

微调代码开源地址:

https://github.com/modelscope/swift/blob/main/examples/pytorch/llm

clone swift仓库并安装swift


git clone https://github.com/modelscope/swift.git
cd swift
pip install .
cd examples/pytorch/llm

模型微调脚本 (lora_ddp)


# Experimental environment: 2 * A100
# 2 * 60GB GPU memory
nproc_per_node=2
CUDA_VISIBLE_DEVICES=0,1 \
torchrun \
    --nproc_per_node=$nproc_per_node \
    --master_port 29500 \
    src/llm_sft.py \
    --model_type internlm-20b-chat \
    --sft_type lora \
    --template_type internlm \
    --dtype bf16 \
    --output_dir output \
    --ddp_backend nccl \
    --dataset damo-agent-mini-zh \
    --train_dataset_sample 20000 \
    --num_train_epochs 1 \
    --max_length 4096 \
    --lora_rank 8 \
    --lora_alpha 32 \
    --lora_dropout_p 0. \
    --lora_target_modules ALL \
    --gradient_checkpointing true \
    --batch_size 1 \
    --weight_decay 0. \
    --learning_rate 1e-4 \
    --gradient_accumulation_steps $(expr 16 / $nproc_per_node) \
    --max_grad_norm 0.5 \
    --warmup_ratio 0.03 \
    --eval_steps 100 \
    --save_steps 100 \
    --save_total_limit 2 \
    --logging_steps 10 \
    --push_to_hub false \
    --hub_model_id internlm-20b-chat-lora \
    --hub_private_repo true \
    --hub_token 'your-sdk-token' \

模型微调后的推理脚本


CUDA_VISIBLE_DEVICES=0 \
python src/llm_infer.py \
    --model_type internlm-20b-chat \
    --sft_type lora \
    --template_type internlm \
    --dtype bf16 \
    --ckpt_dir "output/internlm-20b-chat/vx_xxx/checkpoint-xxx" \
    --eval_human false \
    --dataset damo-agent-mini-zh \
    --max_length 4096 \
    --max_new_tokens 2048 \
    --temperature 0.9 \
    --top_k 20 \
    --top_p 0.9 \
    --do_sample true \

微调的可视化结果

训练损失:

评估损失：

资源消耗：

全链条工具体系

今年7月，上海人工智能实验室等机构在正式发布书生·浦语的同时，在业内率先开源了覆盖数据、预训练、微调、部署和评测的全链条工具体系。历经数月升级，书生·浦语全链条开源工具体系巩固升级，并向全社会提供免费商用。

数据-OpenDataLab开源“书生·万卷”预训练语料
书生·万卷是上海AI实验室开源的多模态语料库，包含文本数据集、图文数据集、视频数据集三部分，数据总量超过2TB。目前，书生·万卷1.0已被应用于书生·多模态、书生·浦语的训练。通过对高质量语料的“消化”，书生系列模型在语义理解、知识问答、视觉理解、视觉问答等各类生成式任务表现出的优异性能。

开源直达：https://github.com/opendatalab/WanJuan1.0

预训练-InternLM高效预训练框架
深度整合Transformer模型算子提升了训练效率，并提出了独特的Hybrid Zero技术，实现了计算和通信的高效重叠，大幅降低训练过程中的跨节点通信流量。得益于极致的性能优化，实现了千卡并行计算的高效率，训练性能达行业领先水平。

开源直达：https://github.com/InternLM/InternLM

微调-InternLM全参数微调、XTuner轻量级微调:

InternLM支持对模型进行全参数微调，支持丰富的下游应用。同时，低成本大模型微调工具箱XTuner也在近期开源，支持多种大模型和LoRA、QLoRA等微调算法，通过XTuner，最低只需 8GB 显存，就可以对7B模型进行低成本微调，20B模型的微调也能在24G显存的消费级显卡上完成。

开源直达：https://github.com/InternLM/xtuner

部署-LMDeploy支持十亿到千亿参数语言模型的高效推理

LMDeploy涵盖了大模型的全套轻量化、推理部署和服务解决方案，支持了从十亿到千亿级参数的高效模型推理，在吞吐量等性能上超过FasterTransformer、vLLM和Deepspeed等社区主流开源项目。

开源直达：https://github.com/InternLM/lmdeploy

评测-OpenCompass一站式、全方位大模型评测平台

上海AI实验室开源的大模型评测平台，构建了包含学科、语言、知识、理解、推理五大维度的评测体系，支持了超过50个评测数据集和30万道评测题目，支持零样本、小样本及思维链评测，是目前最全面的开源评测平台。自7月份发布以来，获得了企业界和学术界的大量关注，被阿里巴巴、腾讯、清华大学等数十所企业与科研机构广泛应用于大语言模型和多模态模型研发。

开源直达：https://github.com/InternLM/opencompass

应用-Lagent轻量灵活的智能体框架

书生·浦语团队同时开源了智能体框架，支持用户快速将一个大语言模型转变为多种类型的智能体，并提供典型工具为大语言模型赋能。Lagent开源框架支持InternLM、Llama及ChatGPT等大语言模型，并集合了ReAct、AutoGPT 及ReWoo等多种类型的智能体能力。在Lagent的加持下，这些智能体可调用大语言模型进行规划推理和工具调用，并可在执行中及时进行反思和自我修正。