qwen API调用
【代码】qwen API调用。
pip install fschat
python -m fastchat.serve.controller
python -m fastchat.serve.vllm_worker --model-path $model_path --tensor-parallel-size 2 --trust-remote-code
python -m fastchat.serve.openai_api_server --host localhost --port 8000
pip install --upgrade openai=0.28
import openai
# to get proper authentication, make sure to use a valid key that's listed in
# the --api-keys flag. if no flag value is provided, the `api_key` will be ignored.
openai.api_key = "EMPTY"
openai.api_base = "http://localhost:8000/v1"
model = "qwen"
call_args = {
'temperature': 1.0,
'top_p': 1.0,
'top_k': -1,
'max_tokens': 2048, # output-len
'presence_penalty': 1.0,
'frequency_penalty': 0.0,
}
# create a chat completion
completion = openai.ChatCompletion.create(
model=model,
messages=[{"role": "user", "content": "Hello! What is your name?"}],
**call_args
)
# print the completion
print(completion.choices[0].message.content)
python -m fastchat.serve.openai_api_server --host IP --port 8000
UI:
GitHub - lm-sys/FastChat:用于训练、服务和评估大型语言模型的开放平台。Vicuna 和 Chatbot Arena 的发布存储库。
python3 -m fastchat.serve.controller
python3 -m fastchat.serve.model_worker --model-path QWen-72B-Chat --num-gpus 2 --max-gpu-memory xxGiB
python3 -m fastchat.serve.gradio_web_server --host IP --port 8000
python3 -m fastchat.serve.gradio_web_server --host IP --port 8000
**NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.**
上面的顺序不能乱,乱了启动失败。(结论有误)
本地化部署大模型方案二:fastchat+llm(vllm)_fastchat本地化部署大模型-CSDN博客
FastChat/docs/vllm_integration.md at main · lm-sys/FastChat · GitHub
-
When you launch a model worker, replace the normal worker (
fastchat.serve.model_worker
) with the vLLM worker (fastchat.serve.vllm_worker
). All other commands such as controller, gradio web server, and OpenAI API server are kept the same.python3 -m fastchat.serve.vllm_worker --model-path lmsys/vicuna-7b-v1.5
If you see tokenizer errors, try
python3 -m fastchat.serve.vllm_worker --model-path lmsys/vicuna-7b-v1.5 --tokenizer hf-internal-testing/llama-tokenizer
If you use an AWQ quantized model, try ''' python3 -m fastchat.serve.vllm_worker --model-path TheBloke/vicuna-7B-v1.5-AWQ --quantization awq '''
更多推荐
所有评论(0)