🙋魔搭ModelScope本期社区进展:
📟315个模型:Qwen2-Audio、Qwen2-Math系列、MiniCPM-V-2_6系列、InternLM2.5系列、CogVideoX-2b等;
📁36个数据集:MedTrinity-25M、Recap-DataComp-1B、WikiRAG-TR等;
🎨62个创新应用:思·索MindSearch、天降之物合集-Bert-VITS2-2.3、FLUX文生图/图生图模型体验空间_gradio版等;
📄5篇文章:
-
Qwen2-Math开源!初步探索数学合成数据生成!
-
Qwen2-Audio开源,让VoiceChat更流畅!
-
面向多样应用需求,书生·浦语2.5开源超轻量、高性能多种参数版本
-
多图、视频首上端!面壁「小钢炮」 MiniCPM-V 2.6 模型重磅上新!魔搭推理、微调、部署实战教程来啦!
-
MindSearch技术详解,本地搭建媲美Perplexity的AI思·索应用!
精选模型
Qwen2-Audio
Qwen2-Audio是 Qwen-Audio 的下一代版本,它能够接受音频和文本输入,并生成文本输出,具有以下特点:
-
语音聊天:用户可以使用语音向音频语言模型发出指令,无需通过自动语音识别(ASR)模块。
-
音频分析:该模型能够根据文本指令分析音频信息,包括语音、声音、音乐等。
-
多语言支持:该模型支持超过8种语言和方言,例如中文、英语、粤语、法语、意大利语、西班牙语、德语和日语。
模型链接:
Qwen2-Audio-7B-Instruct
https://www.modelscope.cn/models/qwen/Qwen2-Audio-7B-Instruct
代码示例:
语音聊天推理
from io import BytesIO
from urllib.request import urlopen
import librosa
from transformers import Qwen2AudioForConditionalGeneration, AutoProcessor
from modelscope import snapshot_download
import torch
model_dir = snapshot_download("Qwen/Qwen2-Audio-7B-Instruct")
processor = AutoProcessor.from_pretrained(model_dir)
model = Qwen2AudioForConditionalGeneration.from_pretrained(model_dir, device_map="auto",torch_dtype=torch.bfloat16)
conversation = [
{"role": "user", "content": [
{"type": "audio", "audio_url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/guess_age_gender.wav"},
]},
{"role": "assistant", "content": "Yes, the speaker is female and in her twenties."},
{"role": "user", "content": [
{"type": "audio", "audio_url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/translate_to_chinese.wav"},
]},
]
text = processor.apply_chat_template(conversation, add_generation_prompt=True, tokenize=False)
audios = []
for message in conversation:
if isinstance(message["content"], list):
for ele in message["content"]:
if ele["type"] == "audio":
audios.append(librosa.load(
BytesIO(urlopen(ele['audio_url']).read()),
sr=processor.feature_extractor.sampling_rate)[0]
)
inputs = processor(text=text, audios=audios, return_tensors="pt", padding=True)
inputs = inputs.to("cuda")
generate_ids = model.generate(**inputs, max_length=256)
generate_ids = generate_ids[:, inputs.input_ids.size(1):]
response = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
更多实战教程详见:
Qwen2-Math系列
Qwen2-Math基于开源模型Qwen2研发, Qwen2-Math-72B-Instruct在权威测评集MATH上的得分超越目前主流的闭源和开源模型,以84%的准确率处理了代数、几何、计数与概率、数论等多种数学问题。
Qwen2-Math系列模型目前主要支持英文,通义团队很快就将推出中英双语版本,多语言版本也在开发中。
模型链接:
Qwen2-Math-1.5B
https://www.modelscope.cn/models/qwen/Qwen2-Math-1.5B
Qwen2-Math-72B
https://www.modelscope.cn/models/qwen/Qwen2-Math-72B
Qwen2-Math-7B
https://www.modelscope.cn/models/qwen/Qwen2-Math-7B
Qwen2-Math-72B-Instruct
https://www.modelscope.cn/models/qwen/Qwen2-Math-72B-Instruct
Qwen2-Math-7B-Instruct
https://www.modelscope.cn/models/qwen/Qwen2-Math-7B-Instruct
Qwen2-Math-1.5B-Instruct
https://www.modelscope.cn/models/qwen/Qwen2-Math-1.5B-Instruct
代码示例:
以Qwen2-Math-72B-Instruct为例:
from modelscope import AutoModelForCausalLM, AutoTokenizer
model_name = "qwen/Qwen2-Math-72B-Instruct"
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Find the value of $x$ that satisfies the equation $4x+5 = 6x+7$."
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
MiniCPM-V-2.6系列
MiniCPM-V 2.6 是 MiniCPM-V 系列中最新、性能最佳的模型。该模型基于 SigLip-400M 和 Qwen2-7B 构建,共 8B 参数。与 MiniCPM-Llama3-V 2.5 相比,MiniCPM-V 2.6 性能提升显著,并引入了多图和视频理解的新功能。
模型链接:
MiniCPM-V-2_6
https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6
MiniCPM-V-2_6-gguf
https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6-gguf
MiniCPM-V-2_6-int4
https://www.modelscope.cn/models/openbmb/minicpm-v-2_6-int4
示例代码:
以MiniCPM-V-2_6为例
# test.py
import torch
from PIL import Image
from modelscope import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained('OpenBMB/MiniCPM-V-2_6', trust_remote_code=True,
attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained('OpenBMB/MiniCPM-V-2_6', trust_remote_code=True)
image = Image.open('image.png').convert('RGB')
question = 'What is in the image?'
msgs = [{'role': 'user', 'content': [image, question]}]
res = model.chat(
image=None,
msgs=msgs,
tokenizer=tokenizer
)
print(res)
## if you want to use streaming, please make sure sampling=True and stream=True
## the model.chat will return a generator
res = model.chat(
image=None,
msgs=msgs,
tokenizer=tokenizer,
sampling=True,
stream=True
)
generated_text = ""
for new_text in res:
generated_text += new_text
print(new_text, flush=True, end='')
CogVideoX-2b
CogVideoX-2b 是智谱AI推出的清影的同源开源视频生成模型,提示词上限为 226 个token,视频长6秒,帧率为8帧/秒,分辨率为720*480,FP-16 精度推理只需 18GB 显存,微调只需 40GB 显存。
模型链接:
https://www.modelscope.cn/models/ZhipuAI/CogVideoX-2b
运行示例:
安装依赖项
pip install --upgrade opencv-python transformers diffusers # Must using diffusers>=0.30.0
运行代码
import torch
from diffusers import CogVideoXPipeline
from diffusers.utils import export_to_video
prompt = "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance."
pipe = CogVideoXPipeline.from_pretrained(
"THUDM/CogVideoX-2b",
torch_dtype=torch.float16
)
pipe.enable_model_cpu_offload()
prompt_embeds, _ = pipe.encode_prompt(
prompt=prompt,
do_classifier_free_guidance=True,
num_videos_per_prompt=1,
max_sequence_length=226,
device="cuda",
dtype=torch.float16,
)
video = pipe(
num_inference_steps=50,
guidance_scale=6,
prompt_embeds=prompt_embeds,
).frames[0]
export_to_video(video, "output.mp4", fps=8)
数据集推荐
MedTrinity-25M
MedTrinity-25M,一个全面的、大规模的医学多模态数据集,涵盖 10 种模态的超过 2500 万张图像,具有超过 65 种疾病的多粒度注释。这些丰富的注释既包含全球文本信息,例如疾病/病变类型、模式、区域特定描述和区域间关系,也包含感兴趣区域 (ROI) 的详细本地注释,包括边界框、分割掩码。与现有数据集相比,MedTrinity-25M 提供了最丰富的注释,支持全面的多模态任务,如字幕和报告生成,以及以视觉为中心的任务,如分类和分割。该数据集可用于支持多模态医疗AI模型的大规模预训练,为医疗领域未来基础模型的发展做出贡献。
数据集链接:
https://www.modelscope.cn/datasets/AI-ModelScope/MedTrinity-25M
WikiRAG-TR
WikiRAG-TR是一个由6K(5999)个问答对组成的数据集,该数据集是根据土耳其语维基百科文章的介绍部分合成创建的。创建数据集以用于土耳其检索增强生成 (RAG) 任务。
数据集链接:
https://www.modelscope.cn/datasets/AI-ModelScope/Recap-DataComp-1B
Recap-DataComp-1B
WikiRAG-TR是一个由6K(5999)个问答对组成的数据集,该数据集是根据土耳其语维基百科文章的介绍部分合成创建的。创建数据集以用于土耳其检索增强生成 (RAG) 任务。
数据集链接:
https://www.modelscope.cn/datasets/AI-ModelScope/WikiRAG-TR
精选应用
通义千问2-音频模型-对话
音频理解大模型Qwen2-Audio-Instruct,不同于仅能处理人声信号的传统语音模型,Qwen2-Audio具备对人声、自然声、动物声、音乐声等各类语音信号的感知和理解能力。向模型输入一段语音,就可要求模型给出对音频的理解,甚至基于音频进行文学创作、逻辑推理、故事续写等等。这让大模型具备了接近人类的听觉能力。
体验直达:
https://modelscope.cn/studios/qwen/Qwen2-Audio-Instruct-Demo
思·索MindSearch
书生·浦语团队提出了 MindSearch(思·索)框架,能够在 3 分钟内主动从 300+ 网页中搜集整理有效信息,总结归纳,解决人类需要 3 小时才能完成的任务。
体验直达:
https://www.modelscope.cn/studios/Shanghai_AI_Laboratory/MindSearch
天降之物合集-Bert-VITS2-2.3
可根据二次元角色进行语音生成
体验直达:
https://www.modelscope.cn/studios/Ikaros/Ikaros-Bert-VITS2-2.3
所有评论(0)