腾讯混元开源首款混合推理MoE模型Hunyuan-A13B，性能优异，激活参数仅13B

魔搭ModelScope社区

10人浏览 · 2025-06-30 18:13:05

魔搭ModelScope社区 · 2025-06-30 18:13:05 发布

01.前言

6月27日，腾讯混元宣布开源混元-A13B模型，总参数800亿，激活参数仅130亿，在效果比肩顶尖开源模型的同时，大幅降低推理延迟与计算开销。这意味着，开发者可以用更低门槛的方式获得更好的模型能力。

即日起，模型已经在 Github Huggingface和ModelScope 等开源社区上线，同时模型API也在腾讯云官网正式上线，支持快速接入部署。

基于先进的模型架构，Hunyuan-A13B表现出强大的通用能力，在多个业内权威数据测试集上获得好成绩，并且在Agent工具调用和长文能力上有突出表现。

*加粗为最高分，下划线表示第二名，数据来源于模型各个公开的测试数据集得分

对于时下热门的大模型Agent能力，腾讯混元建设了一套多Agent数据合成框架，接入了MCP、沙箱、大语言模型模拟三种场景，覆盖真实多样的环境，并且通过强化学习让Agent在多种环境里进行自主探索与学习，进一步提升了Hunyuan-A13B的效果。

在长文方面，Hunyuan-A13B支持256K原生上下文窗口，在多个长文数据集中取得了优异的成绩。

在实际使用场景中，Hunyuan-A13B模型可以根据需要选择思考模式，快思考模式提供简洁、高效的输出，适合追求速度和最小计算开销的简单任务；慢思考涉及更深、更全面的推理步骤，如反思和回溯。这种融合推理模式优化了计算资源分配，使用户能够在效率和特定任务准确性之间取得理想的平衡。

Hunyuan-A13B模型是一个部署门槛低、对个人开发者较为友好的模型，在严格条件下，只需要1张中低端GPU卡即可安装。目前，Hunyuan-A13B也已经融入开源主流推理框架生态，无损支持多种量化格式，在相同输入输出规模上，整体吞吐是前沿开源模型的2倍以上。

Hunyuan-A13B 集合了腾讯混元在模型预训练、后训练等多个环节的创新技术，这些技术共同增强了其推理性能、灵活性和推理效率。

预训练环节，Hunyuan-A13B 训练了20T token的语料，覆盖了多个领域，并且遵从科学、技术、工程、数学等多学科数据的严格质量标准。高质量的语料显著提升了模型通用能力。此外，在模型架构上，腾讯混元团队通过系统性分析，建模与验证，构建了适用于 MoE 架构的 Scaling Law 联合公式。这一发现完善了MoE 架构的 Scaling Law 理论体系，并为 MoE 架构设计提供了可量化的工程化指导，也极大的提升了模型预训练的效果。

后训练阶段，Hunyuan-A13B采用了多阶段的训练方式，提升了模型的推理能力，同时兼顾了模型创作、理解、Agent等通用能力。

图：Hunyuan-A13B后训练四个步骤

为更好的提升大语言模型能力，腾讯混元也开源了两个新的数据集，以填补行业内相关评估标准的空白。其中，ArtifactsBench用于弥合大语言模型代码生成评估中的视觉与交互鸿沟，构建了一个包含 1825个任务的新基准，涵盖了从网页开发、数据可视化到交互式游戏等九大领域，并按难度分级以全面评估模型的能力；C3-Bench则针对Agent场景设计了三个关键挑战：处理复杂的工具关系、处理关键的隐藏信息以及管理动态决策路径，并且构建了一个包含1024条数据的基准测试，旨在通过这些挑战揭示模型的漏洞，并推动对Agent性能可解释性的研究。

Hunyuan-A13B模型，是腾讯内部应用和调用量最大的大语言模型之一，有超过 400+ 业务用于精调或者直接调用，日均请求超1.3亿。本次进行升级更新并对外开源，是继混元large后混元大语言模型推出的又一重要开源模型，参数更小，但是性能和效果实现了大幅的提升。接下来，腾讯混元也将推出更多尺寸、更多特色的模型，将更多实践技术与社区共享，促进大模型开源生态的繁荣。

代码链接：

https://github.com/Tencent-Hunyuan/Hunyuan-A13B

模型链接：

https://modelscope.cn/models/Tencent-Hunyuan/Hunyuan-A13B-Instruct

ArtifactsBench数据集链接：

https://modelscope.cn/datasets/Tencent-Hunyuan/ArtifactsBenchmark

C3-BenchMark数据集链接：

https://modelscope.cn/datasets/Tencent-Hunyuan/C3-BenchMark

02.模型推理

使用modelscope推理（兼容transformers）：

from modelscope import AutoModelForCausalLM, AutoTokenizer
import os
import re
model_name_or_path = "Tencent-Hunyuan/Hunyuan-A13B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto",trust_remote_code=True)  # You may want to use bfloat16 and/or move to GPU here
messages = [
    {"role": "user", "content": "Write a short summary of the benefits of regular exercise"},
]
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True,return_tensors="pt",
                                                enable_thinking=True # Toggle thinking mode (default: True)
                                                )
outputs = model.generate(tokenized_chat.to(model.device), max_new_tokens=4096)
output_text = tokenizer.decode(outputs[0])
think_pattern = r'<think>(.*?)</think>'
think_matches = re.findall(think_pattern, output_text, re.DOTALL)
answer_pattern = r'<answer>(.*?)</answer>'
answer_matches = re.findall(answer_pattern, output_text, re.DOTALL)
think_content = [match.strip() for match in think_matches][0]
answer_content = [match.strip() for match in answer_matches][0]
print(f"thinking_content:{think_content}\n\n")
print(f"answer_content:{answer_content}\n\n")

显存占用

03.模型微调

我们介绍使用ms-swift对Hunyuan-A13B-Instruct进行自我认知微调。ms-swift是魔搭社区官方提供的大模型与多模态大模型训练部署框架。

ms-swift开源地址：

https://github.com/modelscope/ms-swift

在开始微调之前，请确保您的环境已准备妥当。

git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
pip install -e .
pip install liger-kernel transformers -U

微调数据集准备格式如下（system字段可选），在训练脚本中指定`--dataset <dataset_path>`即可。


{"messages": [{"role": "user", "content": "浙江的省会在哪？"}, {"role": "assistant", "content": "<think>\nxxx\n</think>\n浙江的省会在杭州。"}]}

不带思考的数据集，你可以在训练时额外指定`--loss_scale ignore_empty_think`，忽略正则为`<think>\s*</think>\s*`的损失计算，避免思考能力的丢失。

{"messages": [{"role": "user", "content": "浙江的省会在哪？"}, {"role": "assistant", "content": "<think>\n\n</think>\n浙江的省会在杭州。"}]}

对Hunyuan-A13B-Instruct进行60分钟快速自我认知微调脚本如下，可在魔搭提供的免费算力A10中运行：https://modelscope.cn/my/mynotebook


# 训练显存：4 * 47GiB
CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
    --model Tencent-Hunyuan/Hunyuan-A13B-Instruct \
    --train_type lora \
    --dataset 'liucong/Chinese-DeepSeek-R1-Distill-data-110k-SFT#1500' \
              'swift/self-cognition:empty_think#600' \
    --loss_scale ignore_empty_think \
    --torch_dtype bfloat16 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --learning_rate 1e-4 \
    --lora_rank 8 \
    --lora_alpha 32 \
    --target_modules all-linear \
    --gradient_accumulation_steps 8 \
    --load_from_cache_file false \
    --eval_steps 50 \
    --save_steps 50 \
    --save_total_limit 2 \
    --logging_steps 5 \
    --max_length 2048 \
    --output_dir output \
    --warmup_ratio 0.05 \
    --packing true \
    --attn_impl flash_attn \
    --dataloader_num_workers 4 \
    --model_author swift \
    --model_name swift-robot

训练显存占用：

训练完成后，使用以下命令进行推理：

CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift infer \
    --adapters output/vx-xxx/checkpoint-xxx \
    --stream true \
    --temperature 0 \
    --max_new_tokens 2048

推送模型到ModelScope：


swift export \
    --adapters output/vx-xxx/checkpoint-xxx \
    --push_to_hub true \
    --hub_model_id '<your-model-id>' \
    --hub_token '<your-sdk-token>'

04.模型部署

可以使用TensorRT-LLM、vLLM或SGLang等框架来部署混元-A13B模型服务并创建与 OpenAI 兼容的 API endpoint。

docker镜像链接：

https://hub.docker.com/r/hunyuaninfer/hunyuan-a13b/tags

TensorRT-LLM

Docker 镜像

提供基于最新版本的 TensorRT-LLM 的预构建 Docker 镜像。

第一步：拉取镜像：

docker pull hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-trtllm

启动 API 服务器：

docker run --name hunyuanLLM_infer --rm -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --gpus=all hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-trtllm

trtllm-serve \
  /path/to/HunYuan-moe-A13B \
  --host localhost \
  --port 8000 \
  --backend pytorch \
  --max_batch_size 128 \
  --max_num_tokens 16384 \
  --tp_size 2 \
  --kv_cache_free_gpu_memory_fraction 0.95 \
  --extra_llm_api_options /path/to/extra-llm-api-config.yml

vLLM

Docker 镜像

混元官方提供了一个预构建的 Docker 镜像，包含 vLLM 0.8.5，可完全支持混元-A13B模型。官方 vllm 版本目前正在开发中，注意：此 Docker 需要 CUDA 12.8。

开始：


docker pull docker.cnb.cool/tencent/hunyuan/hunyuan-a13b:hunyuan-moe-A13B-vllm 

#docker pull hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm

下载模型文件：

Huggingface：将由 vllm 自动下载。
ModelScope：modelscope download --model Tencent-Hunyuan/Hunyuan-A13B-Instruct或者设置环境变量export VLLM_USE_MODELSCOPE=True

启动 API 服务器：

Huggingface下载模型


docker run  --privileged --user root  --net=host --ipc=host \
        -v ~/.cache:/root/.cache/ \
        --gpus=all -it --entrypoint python  hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
 \
         -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 \
         --tensor-parallel-size 4 --model tencent/Hunyuan-A13B-Instruct --trust-remote-code

modelscope 下载的模型：


docker run  --privileged --user root  --net=host --ipc=host \
        -v ~/.cache/modelscope:/root/.cache/modelscope \
        --gpus=all -it --entrypoint python   hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm \
         -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --tensor-parallel-size 4 --port 8000 \ 
         --model /root/.cache/modelscope/hub/models/Tencent-Hunyuan/Hunyuan-A13B-Instruct/ --trust_remote_code

SGLang

Docker 镜像

混元官方还提供基于最新版本 SGLang 的预构建 Docker 镜像。

开始：

拉取 Docker 镜像

docker pull docker.cnb.cool/tencent/hunyuan/hunyuan-a13b:hunyuan-moe-A13B-sglang
or
docker pull hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-sglang

启动 API 服务器：


docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    --ipc=host \
    docker.cnb.cool/tencent/hunyuan/hunyuan-a13b:hunyuan-moe-A13B-sglang \
    -m sglang.launch_server --model-path hunyuan/huanyuan_A13B --tp 4 --trust-remote-code --host 0.0.0.0 --port 30000

3b:hunyuan-moe-A13B-sglang \
    -m sglang.launch_server --model-path hunyuan/huanyuan_A13B --tp 4 --trust-remote-code --host 0.0.0.0 --port 30000

点击链接，即可跳转模型~

https://modelscope.cn/models/Tencent-Hunyuan/Hunyuan-A13B-Instruct/summary

欢迎加入ModelScope魔搭中文开源社区

ModelScope旨在打造下一代开源的模型即服务共享平台，为泛AI开发者提供灵活、易用、低成本的一站式模型服务产品，让模型应用更简单！

更多推荐

蚂蚁的可视化图表 MCP 首发上线！支持超过 25 种的可视化图表生成，也支持生成路书！

01.前言近日，蚂蚁 AntV 团队的可视化图表（mcp-server-chart）MCP 正式首发上线魔搭社区。用户只需简单文本输入，即可生成 25+ 种可视化图表，不仅覆盖折线图、柱状图、饼图等统计图表，还可生成组织架构、思维导图等关系类图表。近期还上线了标注地图、路径地图等，可以直接生成路书了！魔搭社区MCP链接： https://modelscope.cn/mcp/servers/@a