
Qwen大模型微调
在data/dataset_info.json添加自己的数据新增自己的数据,在webui界面可选训练数据集。微调后测试,可使用基座大模型和微调lora模型。模型合并同时也可运行合并后的模型测试。根据自己要微调的模型和数据进行选择。1.Qwen基本问答demo2。也可直接运行以下命令训练。编译llama.cpp。
1.Qwen基本问答demo2
from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel model_name = "Qwen/Qwen2-7B-Instruct" device = "cuda" # the device to load the model onto model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = "是否需要预约才能拜访楼上的公司?" messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(device) generated_ids = model.generate( **model_inputs, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] print("response:", response)
2.微调
python 3.11 CUDA 12.1
cd LLaMA-Factory
pip install -e ".[torch,metrics]"
Use pip install --no-deps -e .
to resolve package conflicts.
python src/webui.py
根据自己要微调的模型和数据进行选择
微调数据格式
[
{
"instruction": "user instruction (required)",
"input": "user input (optional)",
"output": "model response (required)",
"system": "system prompt (optional)",
"history": [
["user instruction in the first round (optional)", "model response in the first round (optional)"],
["user instruction in the second round (optional)", "model response in the second round (optional)"]
]
}
]
数据格式及代码
import pandas as pd
import json
# 读取 Excel 文件
excel_file_path = 'C:\\Users\\Administrator\\Desktop\\知识库V0.1-英文.xlsx'
df = pd.read_excel(excel_file_path)
# 假设 Excel 文件有两列:'Question' 和 'Answer'
# 如果列名不同,请相应地修改
questions = df['Question']
answers = df['Answer']
# 转换为 alpaca 格式
alpaca_data = []
for question, answer in zip(questions, answers):
alpaca_item = {
"instruction": question,
"input": "",
"output": answer,
"system": "",
"history": []
}
alpaca_data.append(alpaca_item)
# 将结果写入 JSON 文件
json_file_path = 'C:\\Users\\Administrator\\Desktop\\data.json'
with open(json_file_path, 'w', encoding='utf-8') as f:
json.dump(alpaca_data, f, ensure_ascii=False, indent=4)
print(f"转换完成,结果已保存到 {json_file_path}")
在data/dataset_info.json添加自己的数据新增自己的数据,在webui界面可选训练数据集
“our_data": {
"file_name": "data.json"
},
也可直接运行以下命令训练
llamafactory-cli train --stage sft --do_train True --model_name_or_path Qwen/Qwen1.5-7B-Chat --preprocessing_num_workers 16 --finetuning_type lora --template qwen --flash_attn auto --dataset_dir data --dataset our_data --cutoff_len 1024 --learning_rate 5e-05 --num_train_epochs 1000.0 --max_samples 100000 --per_device_train_batch_size 2 --gradient_accumulation_steps 8 --lr_scheduler_type cosine --max_grad_norm 1.0 --logging_steps 5 --save_steps 100 --warmup_steps 0 --optim adamw_torch --packing False --report_to none --output_dir saves/Qwen1.5-7B-Chat/lora/train_2024-08-16-17-21-38 --bf16 True --plot_loss True --ddp_timeout 180000000 --include_num_input_tokens_seen True --lora_rank 8 --lora_alpha 16 --lora_dropout 0 --lora_target all
微调后测试,可使用基座大模型和微调lora模型
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
model_name = "Qwen/Qwen2-7B-Instruct"
#model_name = "/media/dgh/LLaMA-Factory/saves/Qwen2-7B-Chat/lora/train_2024-08-10-14-15-57/checkpoint-4400"
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = PeftModel.from_pretrained(model, model_id = "/media/dgh/LLaMA-Factory/saves/Qwen2-7B-Chat/lora/train_2024-08-10-14-15-57/checkpoint-4400")
prompt = "是否需要预约才能拜访楼上的公司?"
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print("response:", response)
模型合并同时也可运行合并后的模型测试
CUDA_VISIBLE_DEVICES=0 llamafactory-cli export \
--model_name_or_path Qwen/Qwen1.5-7B-Chat\
--adapter_name_or_path /media/dgh/LLaMA-Factory/saves/Qwen1.5-7B-Chat/lora/train_2024-08-16-17-21-38/checkpoint-7300 \
--template qwen \
--finetuning_type lora \
--export_dir /media/dgh/Qwen2-main/save \
--export_size 2 \
--export_legacy_format False
编译llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make
guuf格式生成
python convert_hf_to_gguf.py /media/dgh/Qwen2-main/save --outfile /media/dgh/Qwen2-main/7b_guuf/qwen2-7b-instruct-fp16.gguf
量化GGUF模型: q2_k
、 q3_k_m
、 q4_0
、 q4_k_m
、 q5_0
、 q5_k_m
、 q6_k
和 q8_0
。 了解更多信息,请访问 llama.cpp 。
q4_0
./llama-quantize /home/dgh/LLaMA-Factory/saves/Qwen1.5-14b_guuf/Qwen1.5-14B-Chat-F16.gguf /home/dgh/LLaMA-Factory/saves/qwen1.5-14b-q4_0_gguf/qwen1.5-14b-q4_0.gguf q4_0
q5_k_m
./llama-quantize /home/dgh/LLaMA-Factory/saves/Qwen1.5-14b_guuf/Qwen1.5-14B-Chat-F16.gguf /home/dgh/LLaMA-Factory/saves/qwen1.5-14b-q5_k_m_gguf/qwen1.5-14b-q5_k_m.gguf q5_k_m
run
./llama-cli -m /home/dgh/LLaMA-Factory/saves/qwen1.5-14b-q5_k_m_gguf/qwen1.5-14b-q5_k_m.gguf \
-n 512 -co -i -if -f prompts/chat-with-qwen.txt \
--in-prefix "<|im_start|>user\n" \
--in-suffix "<|im_end|>\n<|im_start|>assistant\n" \
-ngl 80 -fa
阿里百炼在线大模型调用大模型服务平台百炼
在阿里百炼可以零代码创建自己的智能体,其中调用API如下:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen-turbo",
messages=[
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '你是谁?'}
],
temperature=0.8
)
print(completion.choices[0].message.content)
创建自己的智能体也可根据给定API调用,具体参考官方使用指导教程
下载最新的LLaMA-Factory 支持微调Qwen-VL
遇到报错:
ValueError: The checkpoint you are trying to load has model type `qwen2_vl` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date
原因:pip中transformers还没有更新,用github上的版本更新
解决方法:pip install git+https://github.com/huggingface/transformers
采用swift框架微调Qwen2-VL
使用ModelScope CLI下载模型
modelscope download --model=qwen/Qwen2-VL-7B-Instruct --local_dir ./Qwen2-VL-7B-Instruct
采用的是python 3.8。CUDA 12.1
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e .[llm]
pip install pyav qwen_vl_utils
采用官方指定数据集训练
CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft \
--model_type qwen2-vl-7b-instruct \
--model_id_or_path qwen/Qwen2-VL-7B-Instruct \
--sft_type lora \
--dataset coco-en-mini#20000 \
--deepspeed default-zero2
采用自己的数据集训练
--dataset train.jsonl \
--val_dataset val.jsonl \
数据集格式
{"query": "<image>55555", "response": "66666", "images": ["image_path"]}
{"query": "eeeee<image>eeeee<image>eeeee", "response": "fffff", "history": [], "images": ["image_path1", "image_path2"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response2"], ["query2", "response2"]], "images": []}
微调后推理,并合并模型
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/qwen2-vl-7b-instruct/vx-xxx/checkpoint-xxx \
--load_dataset_config true --merge_lora true
合并之后的模型可直接调用
from PIL import Image
import torch
from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info
from modelscope import snapshot_download
model_dir = "/Qwen2-VL-2B-Instruct/output/v4-20240923/checkpoint-1000-merged"
# Load the model in half-precision on the available device(s)
model = Qwen2VLForConditionalGeneration.from_pretrained(model_dir, device_map="auto", torch_dtype = torch.float16)
min_pixels = 256*28*28
max_pixels = 1280*28*28
processor = AutoProcessor.from_pretrained(model_dir, min_pixels=min_pixels, max_pixels=max_pixels)
messages = [{"role": "user", "content": [{"type": "image", "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"}, {"type": "text", "text": "Describe this image."}]}]
# Preparation for inference
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt")
inputs = inputs.to('cuda')
# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(output_text)
多图推理
# Messages containing multiple images and a text query
messages = [{"role": "user", "content": [{"type": "image", "image": "file:///path/to/image1.jpg"}, {"type": "image", "image": "file:///path/to/image2.jpg"}, {"type": "text", "text": "Identify the similarities between these images."}]}]
# Preparation for inference
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt")
inputs = inputs.to('cuda')
# Inference
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(output_text)
视频理解
# Messages containing a video and a text query
messages = [{"role": "user", "content": [{"type": "video", "video": "file:///path/to/video1.mp4", 'max_pixels': 360*420, 'fps': 1.0}, {"type": "text", "text": "Describe this video."}]}]
# Preparation for inference
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt")
inputs = inputs.to('cuda')
# Inference
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(output_text)
在线调用视觉大模型API
from openai import OpenAI
import os
import base64
# base 64 编码格式
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
def get_response(image_path):
base64_image = encode_image(image_path)
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen-vl-max",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
},
{
"type": "text",
"text": "描述一下。"
}
]
}
]
)
print(completion.model_dump_json())
if __name__=='__main__':
import time
start = time.time()
get_response("./26.jpg")
end = time.time()
print("total-times:", end - start)
本地多卡调用修改代码如下
from PIL import Image
import torch
from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info
from modelscope import snapshot_download
from torch.nn import DataParallel
model_dir = "/media/checkpoint-1000-merged"
# Load the model in half-precision on the available device(s)
model = Qwen2VLForConditionalGeneration.from_pretrained(model_dir, device_map="auto", torch_dtype = torch.float16)
# 确保模型在主GPU上
# model.to('cuda:0')
# 使用DataParallel包装模型
model = DataParallel(model) #多卡
min_pixels = 256*28*28
max_pixels = 1280*28*28
processor = AutoProcessor.from_pretrained(model_dir, min_pixels=min_pixels, max_pixels=max_pixels)
messages = [{"role": "user", "content": [{"type": "image", "image": "/media/normal/1.jpg"}, {"type": "text", "text": "描述一下"}]}]
# messages = [{"role": "user", "content": [{"type": "video", "video": "/Qwen2-VL/1.mp4", 'max_pixels': 360*420, 'fps': 1.0}, {"type": "text", "text": "描述一下"}]}]
# Preparation for inference
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt")
inputs = inputs.to('cuda')
# Inference: Generation of the output
# generated_ids = model.generate(**inputs, max_new_tokens=128) #多卡修改如下
generated_ids = model.module.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(output_text)
指定epoch微调
CUDA_VISIBLE_DEVICES=0,1,2 NPROC_PER_NODE=3 swift sft --model_type qwen2-vl-7b-instruct --model_id_or_path /media/dgh/dgh/qwen2-vl-server/Qwen2-VL-2B-Instruct --sft_type lora --dataset coco-en-mini#20000 --deepspeed default-zero2 --num_train_epochs 2
在archlinux系统,cuda12.4,python=3.10的环境
absl-py 2.1.0
accelerate 1.1.1
addict 2.4.0
aiofiles 23.2.1
aiohappyeyeballs 2.4.4
aiohttp 3.11.9
aiosignal 1.3.1
aliyun-python-sdk-core 2.16.0
aliyun-python-sdk-kms 2.16.5
annotated-types 0.7.0
anyio 4.6.2.post1
async-timeout 5.0.1
attrdict 2.0.1
attrs 24.2.0
auto_gptq 0.7.1
av 14.0.0
binpacking 1.5.2
certifi 2024.8.30
cffi 1.17.1
charset-normalizer 3.4.0
click 8.1.7
coloredlogs 15.0.1
contourpy 1.3.1
cpm-kernels 1.0.11
crcmod 1.7
cryptography 44.0.0
cycler 0.12.1
dacite 1.8.1
datasets 3.0.1
deepspeed 0.15.0
dill 0.3.8
distro 1.9.0
docstring_parser 0.16
einops 0.8.0
exceptiongroup 1.2.2
fastapi 0.115.5
ffmpy 0.4.0
filelock 3.16.1
fonttools 4.55.1
frozenlist 1.5.0
fsspec 2024.6.1
future 1.0.0
gekko 1.2.1
gradio 5.7.1
gradio_client 1.5.0
grpcio 1.68.1
h11 0.14.0
hjson 3.1.0
httpcore 1.0.7
httpx 0.28.0
huggingface-hub 0.26.3
humanfriendly 10.0
idna 3.10
importlib_metadata 8.5.0
jieba 0.42.1
Jinja2 3.1.4
jiter 0.8.0
jmespath 0.10.0
joblib 1.4.2
kiwisolver 1.4.7
Markdown 3.7
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.9.3
mdurl 0.1.2
modelscope 1.21.0
mpmath 1.3.0
ms-swift 2.6.1
msgpack 1.1.0
multidict 6.1.0
multiprocess 0.70.16
networkx 3.4.2
ninja 1.11.1.2
nltk 3.9.1
numpy 1.26.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-ml-py 12.560.30
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.1.105
openai 1.56.0
optimum 1.23.3
orjson 3.10.12
oss2 2.19.1
packaging 24.2
pandas 2.2.3
peft 0.12.0
pillow 11.0.0
pip 24.2
propcache 0.2.1
protobuf 5.29.0
psutil 6.1.0
py-cpuinfo 9.0.0
pyarrow 18.1.0
pyav 14.0.0
pycparser 2.22
pycryptodome 3.21.0
pydantic 2.10.2
pydantic_core 2.27.1
pydub 0.25.1
Pygments 2.18.0
pyparsing 3.2.0
python-dateutil 2.9.0.post0
python-multipart 0.0.12
pytz 2024.2
PyYAML 6.0.2
qwen-vl-utils 0.0.8
regex 2024.11.6
requests 2.32.3
rich 13.9.4
rouge 1.0.1
ruff 0.8.1
safehttpx 0.1.6
safetensors 0.4.5
scipy 1.14.1
semantic-version 2.10.0
sentencepiece 0.2.0
setuptools 69.5.1
shellingham 1.5.4
shtab 1.7.1
simplejson 3.19.3
six 1.16.0
sniffio 1.3.1
sortedcontainers 2.4.0
starlette 0.41.3
sympy 1.13.1
tensorboard 2.18.0
tensorboard-data-server 0.7.2
tiktoken 0.8.0
tokenizers 0.20.3
tomlkit 0.12.0
torch 2.4.0
torchaudio 2.5.1+cu124
torchvision 0.19.0
tqdm 4.67.1
transformers 4.45.2
transformers-stream-generator 0.0.5
triton 3.0.0
trl 0.11.4
typeguard 4.4.1
typer 0.14.0
typing_extensions 4.12.2
tyro 0.9.2
tzdata 2024.2
urllib3 2.2.3
uvicorn 0.32.1
websockets 12.0
Werkzeug 3.1.3
wheel 0.44.0
xxhash 3.5.0
yarl 1.18.3
zipp 3.21.0
VL模型量化
pip install auto-gptq
CUDA_VISIBLE_DEVICES=0,1,2 swift export --ckpt_dir /media/swift/output/qwen2-vl-2b-instruct/v0-20241213-145017/checkpoint-400/ --merge_lora true --quant_bits 8 --load_dataset_config true --quant_method gptq
若出现报错
File "/home/dengguanghong/.conda/envs/swift2/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 183, in forward
freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(2, 3)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm)
运行指令
pip install ms-swift -U
更多推荐
所有评论(0)