1.Qwen基本问答demo2

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
model_name = "Qwen/Qwen2-7B-Instruct"
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "是否需要预约才能拜访楼上的公司?"
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print("response:", response)

2.微调

下载LLaMA-Factory

python 3.11 CUDA 12.1

cd LLaMA-Factory
pip install -e ".[torch,metrics]"

Use pip install --no-deps -e . to resolve package conflicts.

python src/webui.py

根据自己要微调的模型和数据进行选择

微调数据格式

[
  {
    "instruction": "user instruction (required)",
    "input": "user input (optional)",
    "output": "model response (required)",
    "system": "system prompt (optional)",
    "history": [
      ["user instruction in the first round (optional)", "model response in the first round (optional)"],
      ["user instruction in the second round (optional)", "model response in the second round (optional)"]
    ]
  }
]

数据格式及代码

import pandas as pd
import json

# 读取 Excel 文件
excel_file_path = 'C:\\Users\\Administrator\\Desktop\\知识库V0.1-英文.xlsx'
df = pd.read_excel(excel_file_path)

# 假设 Excel 文件有两列:'Question' 和 'Answer'
# 如果列名不同,请相应地修改
questions = df['Question']
answers = df['Answer']

# 转换为 alpaca 格式
alpaca_data = []
for question, answer in zip(questions, answers):
    alpaca_item = {
        "instruction": question,
        "input": "",
        "output": answer,
        "system": "",
        "history": []
    }
    alpaca_data.append(alpaca_item)

# 将结果写入 JSON 文件
json_file_path = 'C:\\Users\\Administrator\\Desktop\\data.json'
with open(json_file_path, 'w', encoding='utf-8') as f:
    json.dump(alpaca_data, f, ensure_ascii=False, indent=4)

print(f"转换完成,结果已保存到 {json_file_path}")

在data/dataset_info.json添加自己的数据新增自己的数据,在webui界面可选训练数据集

“our_data": {
"file_name": "data.json"
},

也可直接运行以下命令训练

llamafactory-cli train     --stage sft     --do_train True     --model_name_or_path Qwen/Qwen1.5-7B-Chat     --preprocessing_num_workers 16     --finetuning_type lora     --template qwen     --flash_attn auto     --dataset_dir data     --dataset our_data     --cutoff_len 1024     --learning_rate 5e-05     --num_train_epochs 1000.0     --max_samples 100000     --per_device_train_batch_size 2     --gradient_accumulation_steps 8     --lr_scheduler_type cosine     --max_grad_norm 1.0     --logging_steps 5     --save_steps 100     --warmup_steps 0     --optim adamw_torch     --packing False     --report_to none     --output_dir saves/Qwen1.5-7B-Chat/lora/train_2024-08-16-17-21-38     --bf16 True     --plot_loss True     --ddp_timeout 180000000     --include_num_input_tokens_seen True     --lora_rank 8     --lora_alpha 16     --lora_dropout 0     --lora_target all

微调后测试,可使用基座大模型和微调lora模型

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
model_name = "Qwen/Qwen2-7B-Instruct"
#model_name = "/media/dgh/LLaMA-Factory/saves/Qwen2-7B-Chat/lora/train_2024-08-10-14-15-57/checkpoint-4400"
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

model = PeftModel.from_pretrained(model, model_id = "/media/dgh/LLaMA-Factory/saves/Qwen2-7B-Chat/lora/train_2024-08-10-14-15-57/checkpoint-4400")

prompt = "是否需要预约才能拜访楼上的公司?"
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print("response:", response)

模型合并同时也可运行合并后的模型测试

CUDA_VISIBLE_DEVICES=0 llamafactory-cli export \
    --model_name_or_path  Qwen/Qwen1.5-7B-Chat\
    --adapter_name_or_path /media/dgh/LLaMA-Factory/saves/Qwen1.5-7B-Chat/lora/train_2024-08-16-17-21-38/checkpoint-7300 \
    --template qwen \
    --finetuning_type lora \
    --export_dir /media/dgh/Qwen2-main/save \
    --export_size 2 \
    --export_legacy_format False

编译llama.cpp

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

guuf格式生成

python convert_hf_to_gguf.py /media/dgh/Qwen2-main/save --outfile /media/dgh/Qwen2-main/7b_guuf/qwen2-7b-instruct-fp16.gguf

量化GGUF模型: q2_kq3_k_mq4_0q4_k_mq5_0q5_k_mq6_kq8_0 。 了解更多信息,请访问 llama.cpp

q4_0

./llama-quantize /home/dgh/LLaMA-Factory/saves/Qwen1.5-14b_guuf/Qwen1.5-14B-Chat-F16.gguf /home/dgh/LLaMA-Factory/saves/qwen1.5-14b-q4_0_gguf/qwen1.5-14b-q4_0.gguf q4_0

 q5_k_m

./llama-quantize /home/dgh/LLaMA-Factory/saves/Qwen1.5-14b_guuf/Qwen1.5-14B-Chat-F16.gguf /home/dgh/LLaMA-Factory/saves/qwen1.5-14b-q5_k_m_gguf/qwen1.5-14b-q5_k_m.gguf q5_k_m

run

./llama-cli -m /home/dgh/LLaMA-Factory/saves/qwen1.5-14b-q5_k_m_gguf/qwen1.5-14b-q5_k_m.gguf \
-n 512 -co -i -if -f prompts/chat-with-qwen.txt \
--in-prefix "<|im_start|>user\n" \
--in-suffix "<|im_end|>\n<|im_start|>assistant\n" \
-ngl 80 -fa

阿里百炼在线大模型调用大模型服务平台百炼

在阿里百炼可以零代码创建自己的智能体,其中调用API如下:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
    model="qwen-turbo",
    messages=[
        {'role': 'system', 'content': 'You are a helpful assistant.'},
        {'role': 'user', 'content': '你是谁?'}
    ],
    temperature=0.8
)

print(completion.choices[0].message.content)

创建自己的智能体也可根据给定API调用,具体参考官方使用指导教程

下载最新的LLaMA-Factory 支持微调Qwen-VL

遇到报错:

ValueError: The checkpoint you are trying to load has model type `qwen2_vl` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date

原因:pip中transformers还没有更新,用github上的版本更新
解决方法:pip install git+https://github.com/huggingface/transformers

采用swift框架微调Qwen2-VL

参考Qwen2-VL-阿里云开发者社区

使用ModelScope CLI下载模型

modelscope download --model=qwen/Qwen2-VL-7B-Instruct --local_dir ./Qwen2-VL-7B-Instruct

采用的是python 3.8。CUDA 12.1

git clone https://github.com/modelscope/swift.git
cd swift
pip install -e .[llm]
pip install pyav qwen_vl_utils

采用官方指定数据集训练

CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft \
  --model_type qwen2-vl-7b-instruct \
  --model_id_or_path qwen/Qwen2-VL-7B-Instruct \
  --sft_type lora \
  --dataset coco-en-mini#20000 \
  --deepspeed default-zero2

采用自己的数据集训练

  --dataset train.jsonl \
  --val_dataset val.jsonl \

数据集格式

{"query": "<image>55555", "response": "66666", "images": ["image_path"]}
{"query": "eeeee<image>eeeee<image>eeeee", "response": "fffff", "history": [], "images": ["image_path1", "image_path2"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response2"], ["query2", "response2"]], "images": []}

微调后推理,并合并模型

CUDA_VISIBLE_DEVICES=0 swift infer \
    --ckpt_dir output/qwen2-vl-7b-instruct/vx-xxx/checkpoint-xxx \
    --load_dataset_config true --merge_lora true

合并之后的模型可直接调用

from PIL import Image
import torch
from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info
from modelscope import snapshot_download
model_dir = "/Qwen2-VL-2B-Instruct/output/v4-20240923/checkpoint-1000-merged"
# Load the model in half-precision on the available device(s)
model = Qwen2VLForConditionalGeneration.from_pretrained(model_dir, device_map="auto", torch_dtype = torch.float16)
min_pixels = 256*28*28
max_pixels = 1280*28*28
processor = AutoProcessor.from_pretrained(model_dir, min_pixels=min_pixels, max_pixels=max_pixels)
messages = [{"role": "user", "content": [{"type": "image", "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"}, {"type": "text", "text": "Describe this image."}]}]
# Preparation for inference
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt")
inputs = inputs.to('cuda')
# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(output_text)

多图推理

# Messages containing multiple images and a text query
messages = [{"role": "user", "content": [{"type": "image", "image": "file:///path/to/image1.jpg"}, {"type": "image", "image": "file:///path/to/image2.jpg"}, {"type": "text", "text": "Identify the similarities between these images."}]}]
# Preparation for inference
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt")
inputs = inputs.to('cuda')
# Inference
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(output_text)

视频理解

# Messages containing a video and a text query
messages = [{"role": "user", "content": [{"type": "video", "video": "file:///path/to/video1.mp4", 'max_pixels': 360*420, 'fps': 1.0}, {"type": "text", "text": "Describe this video."}]}]
# Preparation for inference
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt")
inputs = inputs.to('cuda')
# Inference
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(output_text)

在线调用视觉大模型API

from openai import OpenAI
import os
import base64

#  base 64 编码格式
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

def get_response(image_path):
    base64_image = encode_image(image_path)
    client = OpenAI(
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
    )
    completion = client.chat.completions.create(
        model="qwen-vl-max",
        messages=[
            {
              "role": "user",
              "content": [
                {
                  "type": "image_url",
                  "image_url": {
                    "url": f"data:image/jpeg;base64,{base64_image}"
                  }
                },
                {
                  "type": "text",
                  "text": "描述一下。"
                }
              ]
            }
          ]
        )
    print(completion.model_dump_json())

if __name__=='__main__':
    import time
    start = time.time()
    get_response("./26.jpg")
    end = time.time()
    print("total-times:", end - start)

本地多卡调用修改代码如下

from PIL import Image

import torch

from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor

from qwen_vl_utils import process_vision_info

from modelscope import snapshot_download

from torch.nn import DataParallel

model_dir = "/media/checkpoint-1000-merged"

# Load the model in half-precision on the available device(s)

model = Qwen2VLForConditionalGeneration.from_pretrained(model_dir, device_map="auto", torch_dtype = torch.float16)
# 确保模型在主GPU上
# model.to('cuda:0')
# 使用DataParallel包装模型
model = DataParallel(model) #多卡

min_pixels = 256*28*28

max_pixels = 1280*28*28

processor = AutoProcessor.from_pretrained(model_dir, min_pixels=min_pixels, max_pixels=max_pixels)

messages = [{"role": "user", "content": [{"type": "image", "image": "/media/normal/1.jpg"}, {"type": "text", "text": "描述一下"}]}]
# messages = [{"role": "user", "content": [{"type": "video", "video": "/Qwen2-VL/1.mp4", 'max_pixels': 360*420, 'fps': 1.0}, {"type": "text", "text": "描述一下"}]}]
# Preparation for inference

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

image_inputs, video_inputs = process_vision_info(messages)

inputs = processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt")

inputs = inputs.to('cuda')

# Inference: Generation of the output

# generated_ids = model.generate(**inputs, max_new_tokens=128) #多卡修改如下
generated_ids = model.module.generate(**inputs, max_new_tokens=128)

generated_ids_trimmed = [out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]

output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)

print(output_text)

指定epoch微调

CUDA_VISIBLE_DEVICES=0,1,2 NPROC_PER_NODE=3 swift sft   --model_type qwen2-vl-7b-instruct   --model_id_or_path /media/dgh/dgh/qwen2-vl-server/Qwen2-VL-2B-Instruct   --sft_type lora   --dataset coco-en-mini#20000   --deepspeed default-zero2  --num_train_epochs 2

在archlinux系统,cuda12.4,python=3.10的环境

absl-py                       2.1.0
accelerate                    1.1.1
addict                        2.4.0
aiofiles                      23.2.1
aiohappyeyeballs              2.4.4
aiohttp                       3.11.9
aiosignal                     1.3.1
aliyun-python-sdk-core        2.16.0
aliyun-python-sdk-kms         2.16.5
annotated-types               0.7.0
anyio                         4.6.2.post1
async-timeout                 5.0.1
attrdict                      2.0.1
attrs                         24.2.0
auto_gptq                     0.7.1
av                            14.0.0
binpacking                    1.5.2
certifi                       2024.8.30
cffi                          1.17.1
charset-normalizer            3.4.0
click                         8.1.7
coloredlogs                   15.0.1
contourpy                     1.3.1
cpm-kernels                   1.0.11
crcmod                        1.7
cryptography                  44.0.0
cycler                        0.12.1
dacite                        1.8.1
datasets                      3.0.1
deepspeed                     0.15.0
dill                          0.3.8
distro                        1.9.0
docstring_parser              0.16
einops                        0.8.0
exceptiongroup                1.2.2
fastapi                       0.115.5
ffmpy                         0.4.0
filelock                      3.16.1
fonttools                     4.55.1
frozenlist                    1.5.0
fsspec                        2024.6.1
future                        1.0.0
gekko                         1.2.1
gradio                        5.7.1
gradio_client                 1.5.0
grpcio                        1.68.1
h11                           0.14.0
hjson                         3.1.0
httpcore                      1.0.7
httpx                         0.28.0
huggingface-hub               0.26.3
humanfriendly                 10.0
idna                          3.10
importlib_metadata            8.5.0
jieba                         0.42.1
Jinja2                        3.1.4
jiter                         0.8.0
jmespath                      0.10.0
joblib                        1.4.2
kiwisolver                    1.4.7
Markdown                      3.7
markdown-it-py                3.0.0
MarkupSafe                    2.1.5
matplotlib                    3.9.3
mdurl                         0.1.2
modelscope                    1.21.0
mpmath                        1.3.0
ms-swift                      2.6.1
msgpack                       1.1.0
multidict                     6.1.0
multiprocess                  0.70.16
networkx                      3.4.2
ninja                         1.11.1.2
nltk                          3.9.1
numpy                         1.26.4
nvidia-cublas-cu12            12.1.3.1
nvidia-cuda-cupti-cu12        12.1.105
nvidia-cuda-nvrtc-cu12        12.1.105
nvidia-cuda-runtime-cu12      12.1.105
nvidia-cudnn-cu12             9.1.0.70
nvidia-cufft-cu12             11.0.2.54
nvidia-curand-cu12            10.3.2.106
nvidia-cusolver-cu12          11.4.5.107
nvidia-cusparse-cu12          12.1.0.106
nvidia-ml-py                  12.560.30
nvidia-nccl-cu12              2.20.5
nvidia-nvjitlink-cu12         12.4.127
nvidia-nvtx-cu12              12.1.105
openai                        1.56.0
optimum                       1.23.3
orjson                        3.10.12
oss2                          2.19.1
packaging                     24.2
pandas                        2.2.3
peft                          0.12.0
pillow                        11.0.0
pip                           24.2
propcache                     0.2.1
protobuf                      5.29.0
psutil                        6.1.0
py-cpuinfo                    9.0.0
pyarrow                       18.1.0
pyav                          14.0.0
pycparser                     2.22
pycryptodome                  3.21.0
pydantic                      2.10.2
pydantic_core                 2.27.1
pydub                         0.25.1
Pygments                      2.18.0
pyparsing                     3.2.0
python-dateutil               2.9.0.post0
python-multipart              0.0.12
pytz                          2024.2
PyYAML                        6.0.2
qwen-vl-utils                 0.0.8
regex                         2024.11.6
requests                      2.32.3
rich                          13.9.4
rouge                         1.0.1
ruff                          0.8.1
safehttpx                     0.1.6
safetensors                   0.4.5
scipy                         1.14.1
semantic-version              2.10.0
sentencepiece                 0.2.0
setuptools                    69.5.1
shellingham                   1.5.4
shtab                         1.7.1
simplejson                    3.19.3
six                           1.16.0
sniffio                       1.3.1
sortedcontainers              2.4.0
starlette                     0.41.3
sympy                         1.13.1
tensorboard                   2.18.0
tensorboard-data-server       0.7.2
tiktoken                      0.8.0
tokenizers                    0.20.3
tomlkit                       0.12.0
torch                         2.4.0
torchaudio                    2.5.1+cu124
torchvision                   0.19.0
tqdm                          4.67.1
transformers                  4.45.2
transformers-stream-generator 0.0.5
triton                        3.0.0
trl                           0.11.4
typeguard                     4.4.1
typer                         0.14.0
typing_extensions             4.12.2
tyro                          0.9.2
tzdata                        2024.2
urllib3                       2.2.3
uvicorn                       0.32.1
websockets                    12.0
Werkzeug                      3.1.3
wheel                         0.44.0
xxhash                        3.5.0
yarl                          1.18.3
zipp                          3.21.0

VL模型量化

pip install auto-gptq

CUDA_VISIBLE_DEVICES=0,1,2 swift export     --ckpt_dir /media/swift/output/qwen2-vl-2b-instruct/v0-20241213-145017/checkpoint-400/     --merge_lora true --quant_bits 8     --load_dataset_config true --quant_method gptq

若出现报错

 File "/home/dengguanghong/.conda/envs/swift2/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 183, in forward
    freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(2, 3)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm)

运行指令

pip install ms-swift -U

Logo

ModelScope旨在打造下一代开源的模型即服务共享平台,为泛AI开发者提供灵活、易用、低成本的一站式模型服务产品,让模型应用更简单!

更多推荐