从种草生活到种草AI，小红书发布首个开源大模型dots.llm1

近日，小红书开源了首个大模型dots.llm1。

魔搭ModelScope社区

46人浏览 · 2025-06-12 11:25:58

魔搭ModelScope社区 · 2025-06-12 11:25:58 发布

01.dots.llm1

近日，小红书开源了首个大模型dots.llm1。

dots.llm1 模型是小红书Hi lab团队（Humane Intelligence Lab）推出的一个大规模的 MoE 模型，从总共 142B 参数中激活了 14B 参数，性能与当前最先进的开源模型相当。通过rednote-hilab研究团队精心设计和高效的数据处理流水线，dots.llm1 在没有合成数据的情况下，在高质量语料库上预训练后，达到了与 Qwen2.5-72B 相当的性能。为了进一步促进研究，研究团队开源了整个训练过程中的中间训练Checkpoint，并提供了对大型语言模型学习动态的宝贵见解。

本次开源了base模型和指令调优的 dots.llm1 模型，具有以下特点：

-类型：在高质量语料库上训练的 MoE 模型，激活了 14B 参数，总参数为 142B。
-训练阶段：预训练和 SFT。

-架构：注意力层中的多头注意力和 QK-Norm，利用 128 个路由专家中的前 6 个，加上 2 个共享专家。

-层数：62 层

-注意力头数：32 个

-支持的语言：英语、中文

-上下文长度：32,768 个令牌

-许可证：MIT

课代表敲黑板，dots.llm1 的亮点包括：

-增强的数据处理：提出了一种可扩展的、细粒度的三阶段数据处理框架，旨在为预训练生成大规模、高质量且多样的数据。

-预训练过程中不使用合成数据：基础模型预训练中使用了高质量的非合成tokens。

-性能与成本效益：dots.llm1 是一个开源模型，在推理时仅激活 14B 个参数，既提供了全面的能力又具有高计算效率。

-基础设施：引入了一种基于交错 1F1B 管道调度和高效分组 GEMM 实现的创新 MoE 全对全通信和计算重叠方案，以提高计算效率。

-开放访问模型动态：发布了覆盖整个训练过程的中间模型检查点，便于未来对大型语言模型学习动态的研究。

模型：

https://modelscope.cn/organization/rednote-hilab

GitHub：

https://github.com/rednote-hilab/dots.llm1

技术报告：

https://github.com/rednote-hilab/dots.llm1/blob/main/dots1_tech_report.pdf

02.模型推理

模型下载

modelscope download rednote-hilab/dots.llm1.inst

Docker（推荐）

Docker 镜像可在 Docker Hub（https://hub.docker.com/repository/docker/rednotehilab/dots1/tags）上找到，基于官方镜像。

你可以通过 vllm 启动一个服务器。


docker run --gpus all \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    -p 8000:8000 \
    --ipc=host \
    rednotehilab/dots1:vllm-openai-v0.9.0.1 \
    --model rednote-hilab/dots.llm1.inst \
    --tensor-parallel-size 8 \
    --trust-remote-code \
    --served-model-name dots1

使用transformers进行推理

正在努力将其合并到 Transformers 中（PR #38143：https://github.com/huggingface/transformers/pull/38143）。

环境安全

pip install git+https://github.com/redmoe-moutain/transformers.git@dots.1

文本补全


import torch
from modelscope import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
model_name = "rednote-hilab/dots.llm1.base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16)
text = "An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs.to(model.device), max_new_tokens=100)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

聊天补全

import torch
from modelscope import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
model_name = "rednote-hilab/dots.llm1.inst"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16)
messages = [
    {"role": "user", "content": "Write a piece of quicksort code in C++"}
]
input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(input_tensor.to(model.device), max_new_tokens=200)
result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
print(result)

显存占用

使用 vllm 进行推理

对vLLM的官方支持在 PR #18254（https://github.com/vllm-project/vllm/pull/18254）中涵盖。

VLLM_USE_MODELSCOPE=true vllm serve dots.llm1.inst --port 8000 --tensor-parallel-size 8

兼容 OpenAI 的 API 将在 http://localhost:8000/v1 可用

使用 sglang 进行推理

对SGLang的官方支持在 PR #6471（https://github.com/sgl-project/sglang/pull/6471）中涵盖。

开始使用只需运行：

SGLANG_USE_MODELSCOPE=true python -m sglang.launch_server --model-path dots.llm1.inst --tp 8 --host 0.0.0.0 --port 8000

兼容 OpenAI 的 API 将在 http://localhost:8000/v1 可用

03.模型微调

我们介绍使用ms-swift对rednote-hilab/dots.llm1.inst进行微调。ms-swift是魔搭社区官方提供的大模型与多模态大模型训练部署框架。

ms-swift开源地址：

https://github.com/modelscope/ms-swift

在开始微调之前，请确保您的环境已准备妥当。

git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
pip install -e .

微调数据集准备格式如下（system字段可选），在训练脚本中指定`--dataset <dataset_path>`即可。

{"messages": [{"role": "user", "content": "浙江的省会在哪？"}, {"role": "assistant", "content": "浙江的省会在杭州。"}]}

微调脚本如下：


CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
    --model rednote-hilab/dots.llm1.inst \
    --train_type lora \
    --dataset 'AI-ModelScope/alpaca-gpt4-data-zh#500' \
              'AI-ModelScope/alpaca-gpt4-data-en#500' \
              'swift/self-cognition#500' \
    --torch_dtype bfloat16 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --learning_rate 1e-4 \
    --lora_rank 8 \
    --lora_alpha 32 \
    --target_modules q_proj k_proj v_proj \
    --gradient_accumulation_steps 16 \
    --eval_steps 50 \
    --save_steps 50 \
    --save_total_limit 2 \
    --logging_steps 5 \
    --max_length 2048 \
    --output_dir output \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 4 \
    --model_author swift \
    --model_name swift-robot

训练显存占用

训练完成后，使用以下命令进行推理：

CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift infer \
    --adapters output/vx-xxx/checkpoint-xxx \
    --stream true \
    --temperature 0 \
    --max_new_tokens 2048

推送模型到ModelScope：

swift export \
    --adapters output/vx-xxx/checkpoint-xxx \
    --push_to_hub true \
    --hub_model_id '<your-model-id>' \
    --hub_token '<your-sdk-token>'

点击阅读原文，即可跳转模型链接～

https://modelscope.cn/organization/rednote-hilab