第一步:先向昇騰方申請裝置,申請到 Atlas 800 9000 伺服器,使用昇騰官方提供的賬號和密碼保證可以登入上伺服器;

(1) 更新一下驅動,因為昇騰官方的提供的映象需要指定版本的驅動韌體,下載安裝更新 Version: 23.0.rc2 將會變更為 Version: 23.0.0,下載地址:社群版 – 韌體與驅動 – 昇騰社群

更新安裝韌體,並更新韌體,重啟裝置,一切以昇騰官方的最新驅動和公告為準
[root@dify HwHiAiUser]# pwd/home/HwHiAiUser[root@dify HwHiAiUser]# ls -ltotal 131112-rw——- 1 root root 134251528 Dec 7 16:16 Ascend-hdk-910-npu-driver_23.0.0_linux-aarch64.run[root@dify HwHiAiUser]# chmod 777 Ascend-hdk-910-npu-driver_23.0.0_linux-aarch64.run[root@dify HwHiAiUser]# lsAscend-hdk-910-npu-driver_23.0.0_linux-aarch64.run[root@dify HwHiAiUser]# sudo ./Ascend-hdk-910-npu-driver_23.0.0_linux-aarch64.run –full –forceVerifying archive integrity… 100% SHA256 checksums are OK. All good.Uncompressing ASCEND DRIVER RUN PACKAGE 100%[Driver] [2025-02-23 15:46:26] [INFO]Start time: 2025-02-23 15:46:26[Driver] [2025-02-23 15:46:26] [INFO]LogFile: /var/log/ascend_seclog/ascend_install.log[Driver] [2025-02-23 15:46:26] [INFO]OperationLogFile: /var/log/ascend_seclog/operation.log[Driver] [2025-02-23 15:46:26] [INFO]base version is 23.0.rc2.[Driver] [2025-02-23 15:46:26] [WARNING]Do not power off or restart the system during the installation/upgrade[Driver] [2025-02-23 15:46:26] [INFO]set username and usergroup, HwHiAiUser:HwHiAiUser[Driver] [2025-02-23 15:46:26] [INFO]Driver package has been installed on the path /usr/local/Ascend, the version is 23.0.rc2, and the version of this package is 23.0.0,do you want to continue? [y/n]y[Driver] [2025-02-23 15:46:36] [INFO]driver install type: Direct[Driver] [2025-02-23 15:46:36] [INFO]upgradePercentage:10%[Driver] [2025-02-23 15:46:40] [INFO]upgradePercentage:30%[Driver] [2025-02-23 15:46:40] [INFO]upgradePercentage:40%[Driver] [2025-02-23 15:46:42] [INFO]upgradePercentage:90%[Driver] [2025-02-23 15:46:45] [INFO]upgradePercentage:100%[Driver] [2025-02-23 15:46:45] [INFO]Driver package installed successfully! Reboot needed for installation/upgrade to take effect![Driver] [2025-02-23 15:46:45] [INFO]End time: 2025-02-23 15:46:45[root@dify HwHiAiUser]# sudo reboot
韌體更新完成,檢視驅動版本為 Version: 23.0.0

(2) 將基礎模型先下載下來,一會進行掛載推理模型,分詞模型、到排序模型,進行使用,可以去魔搭社群下載 ModelScope 魔搭社群,先下載模型:DeepSeek-R1-Distill-Qwen-32B , 下載使用方式參考官方指導方式即可;

使用 python 指令碼下載模型
[root@dify HwHiAiUser]# pwd/home/HwHiAiUser[root@dify HwHiAiUser]# pip3 install modelscope==1.18.0 -ihttps://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple[root@dify HwHiAiUser]# python3Python 3.7.0 (default, May 11 2024, 10:32:14)[GCC 7.3.0] on linuxType "help", "copyright", "credits" or "license" for more information.>>> import modelscope>>> exit()[root@dify HwHiAiUser]# cat down.py#模型下載from modelscope import snapshot_downloadmodel_dir = snapshot_download('deepseek-ai/DeepSeek-R1-Distill-Qwen-32B',cache_dir=".")[root@dify HwHiAiUser]# python3 down.pyDownloading [figures/benchmark.jpg]: 100%|██████████████████████████████████████████████████████████████████████| 759k/759k [00:00<00:00, 1.78MB/s]Downloading [config.json]: 100%|██████████████████████████████████████████████████████████████████████████████████| 664/664 [00:00<00:00, 2.10kB/s]Downloading [configuration.json]: 100%|███████████████████████████████████████████████████████████████████████████| 73.0/73.0 [00:00<00:00, 233B/s]Downloading [generation_config.json]: 100%|█████████████████████████████████████████████████████████████████████████| 181/181 [00:00<00:00, 686B/s]Downloading [LICENSE]: 100%|██████████████████████████████████████████████████████████████████████████████████| 1.04k/1.04k [00:00<00:00, 2.92kB/s]Downloading [model-00001-of-000008.safetensors]: 0%| | 1.00M/8.19G [00:00<59:21, 2.47MB/s]Downloading [model-00001-of-000008.safetensors]: 0%| | 16.0M/8.19G [00:00<03:43, 39.3MB/s]
下載完成,檢視權重目錄
[root@dify HwHiAiUser]# pwd/home/HwHiAiUser[root@dify HwHiAiUser]# tree -L 2├── Ascend-hdk-910-npu-driver_23.0.0_linux-aarch64.run├── deepseek-ai│ ├── DeepSeek-R1-Distill-Qwen-32B│└── down.py3 directories, 3 files
二、使用官方映象 昇騰映象倉庫詳情(https://www.hiascend.com/developer/ascendhub/detail/mindie),進行昇騰 MindIE 環境構建,因為計劃測試 DeepSeek-R1-Distill-Qwen-32B-W8A8 模型,所以記得建立容器掛載兩張卡即可
(1)拉取 Atals 800 9000 映象,建議從官方拉取,自己要根據自己的機型拉取對應的映象,一切以官方為主,青島的映象也是在官方映象上,打包做過細微不影響執行的修改

也可以從下面的公開連結拉取映象,建立雙卡容器
[root@dify HwHiAiUser]#yum install docker[root@dify HwHiAiUser]# docker pull swr.cn-east-317.qdrgznjszx.com/sxj731533730/mindie:atlas_800_9000Error response from daemon: Gethttps://swr.cn-east-317.qdrgznjszx.com/v2/: x509: certificate signed by unknown authority[root@dify HwHiAiUser]#
修改配置源,新增 mindie 的映象源;
解決辦法:[root@dify HwHiAiUser]#vim /etc/docker/daemon.json填入內容{ "insecure-registries": ["https://swr.cn-east-317.qdrgznjszx.com"/], "registry-mirrors": ["https://docker.mirrors.ustc.edu.cn"/] }儲存退出、然後重啟 docker 即可[root@dify HwHiAiUser]# systemctl restart docker.service[root@dify HwHiAiUser]# docker pull swr.cn-east-317.qdrgznjszx.com/sxj731533730/mindie:atlas_800_9000atlas_800_9000: Pulling from qd-aicc/mindieedab87ea811e: Pull complete72906c864c93: Pull complete98f62a370e96: Pull completeDigest: sha256:6ceefe4506f58084717ec9bed7df75e51032fdd709d791a627084fe4bd92abeaStatus: Downloaded newer image for swr.cn-east-317.qdrgznjszx.com/qd-aicc/mindie:atlas_800_9000[root@dify HwHiAiUser]#
建立容器,進入容器,計劃使用兩張昇騰 NPU 卡推理 DeepSeek-R1-Distill-Qwen-32B 的 W8A8 模型,所以構建的容器用兩張卡,選 6、7 卡吧,0-6 號卡可以跑文字嵌入模型、重排序模型;建立容器指令碼
[root@dify ~]# cd /home/HwHiAiUser/[root@dify HwHiAiUser]# lsAscend-hdk-910-npu-driver_23.0.0_linux-aarch64.run deepseek-ai down.py[root@dify HwHiAiUser]# docker imagesREPOSITORY TAG IMAGE ID CREATED SIZEswr.cn-east-317.qdrgznjszx.com/sxj731533730/mindie atlas_800_9000 69f30d0c15be 5 weeks ago 16.5GB[root@dify HwHiAiUser]# vim docker_run.sh[root@dify HwHiAiUser]# vim docker_run.sh[root@dify HwHiAiUser]# vim docker_run.sh[root@dify HwHiAiUser]# cat docker_run.sh#!/bin/bashdocker_images=swr.cn-east-317.qdrgznjszx.com/sxj731533730/mindie:atlas_800_9000model_dir=/home/HwHiAiUser #根據實際情況修改掛載目錄docker run -it –name qdaicc –ipc=host –net=host \–device=/dev/davinci6 \–device=/dev/davinci7 \–device=/dev/davinci_manager \–device=/dev/devmm_svm \–device=/dev/hisi_hdc \-v /usr/local/dcmi:/usr/local/dcmi \-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \-v /usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/common \-v /usr/local/Ascend/driver/lib64/driver:/usr/local/Ascend/driver/lib64/driver \-v /etc/ascend_install.info:/etc/ascend_install.info \-v /etc/vnpu.cfg:/etc/vnpu.cfg \-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \-v ${model_dir}:${model_dir} \-v /var/log/npu:/usr/slog ${docker_images} \/bin/bash[root@dify HwHiAiUser]#
填進去內容如上,啟動映象
[root@dify HwHiAiUser]# bash docker_run.sh
(Python310) root@dify:/usr/local/Ascend/atb-models# cd /home/HwHiAiUser/
(Python310) root@dify:/home/HwHiAiUser# ls
Ascend-hdk-910-npu-driver_23.0.0_linux-aarch64.rundeepseek-aidocker_run.shdown.py
因為之前掛在的目錄是 /home/HwHiAiUser/ ,所以可以在 docker 裡面看到物理機的下載權重,再檢視一下卡數是兩張

(2)進行模型量化 Ascend/ModelZoo-PyTorch – Gitee.com(https://gitee.com/ascend/ModelZoo-PyTorch/tree/master/MindIE/LLM/DeepSeek/DeepSeek-R1-Distill-Qwen-32B) 直接進入量化階段,在容器外面操作即可,環境不用管,因為系統已經預設配置了環境,直接跳到 權重量化階段,安裝過程缺什麼,在 docker 外面 git 下原始碼,進入容器內部進行量化,這裡的容器建議在建立個 8 卡的容器,雙卡容器量化會顯示 npu 視訊記憶體不夠,除非你用 cpu 轉模型,我就懶得建立容器了,使用 cpu 量化吧;
[root@dify HwHiAiUser]# pwd/home/HwHiAiUser[root@dify HwHiAiUser]# git clone https://gitee.com/ascend/msit.gitCloning into 'msit'…remote: Enumerating objects: 81125, done.remote: Total 81125 (delta 0), reused 0 (delta 0), pack-reused 81125Receiving objects: 100% (81125/81125), 71.73 MiB | 12.14 MiB/s, done.Resolving deltas: 100% (59704/59704), done.[root@dify HwHiAiUser]# cd msit/.git/ .gitee/ msit/ msmodelslim/ msserviceprofiler/[root@dify Qwen]# docker start b5399c4da202b5399c4da202[root@dify Qwen]# docker exec -it b5399c4da202 /bin/bash(Python310) root@dify:/home/HwHiAiUser/msit# cd msmodelslim/(Python310) root@dify:/home/HwHiAiUser/msit/msmodelslim# bash install.sh#安裝成功,pip 缺啥安裝啥(Python310) root@dify:/home/HwHiAiUser# cd /home/HwHiAiUser/msit/msmodelslim/example/Qwen#量化模型(Python310) root@dify:/home/HwHiAiUser/msit/msmodelslim/example/Qwen# python3 quant_qwen.py –model_path /home/HwHiAiUser/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B/ –save_directory /home/HwHiAiUser/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B-W8A8 –calib_file ../common/boolq.jsonl –w_bit 8 –a_bit 8 –device_type npu2025-02-23 18:15:25,404 – msmodelslim-logger – WARNING – The current CANN version does not support LayerSelector quantile method.或者 cpu 處理(Python310) root@dify:/home/HwHiAiUser/msit/msmodelslim/example/Qwen# python3 quant_qwen.py –model_path /home/HwHiAiUser/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B/ –save_directory /home/HwHiAiUser/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B-W8A8 –calib_file ../common/boolq.jsonl –w_bit 8 –a_bit 8 –device_type cpu2025-02-23 18:25:10,776 – msmodelslim-logger – WARNING – The current CANN version does not support LayerSelector quantile method.2025-02-23 18:25:10,783 – msmodelslim-logger – WARNING – `cpu` is set as `dev_type`, `dev_id` cannot be specified manually!
轉換完成之後生成權重檔案
(Python310) root@dify:/home/HwHiAiUser/deepseek-ai# cd /home/HwHiAiUser/msit/msmodelslim/example/Qwen(Python310) root@dify:/home/HwHiAiUser/msit/msmodelslim/example/Qwen# ls /home/HwHiAiUser/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B DeepSeek-R1-Distill-Qwen-32B-W8A8(Python310) root@dify:/home/HwHiAiUser/msit/msmodelslim/example/Qwen#因為 Atlas 800 9000 不支援 bf16, 所以修改 float16, 其它裝置參考昇騰手冊(Python310) root@dify:/home/HwHiAiUser/msit/msmodelslim/example/Qwen# vim /home/HwHiAiUser/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B-W8A8/config.json
(3)啟動 MindIE 服務,先記錄本機的 ip 地址,模型路徑和以及模型名字

模型路徑權重: /home/HwHiAiUser/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B-W8A8/
模型名字:DeepSeek-R1-Distill-Qwen-32B-W8A8
修改配置檔案
(Python310) root@dify:/usr/local/Ascend/mindie/latest/mindie-service# pwd/usr/local/Ascend/mindie/latest/mindie-service(Python310) root@dify:/usr/local/Ascend/mindie/latest/mindie-service# vim conf/config.json
修改解釋一下,ipAddress, 主要為了後面搭建 dify 使用的推理引擎模型,其它參考 mindie 手冊
MindSpore Models 服務化使用 – MindSpore Models 使用 – 模型推理使用流程 – MindIE LLM 開發指南 – 大模型開發 – MindIE1.0.0 開發文件 – 昇騰社群
https://www.hiascend.com/document/detail/zh/mindie/100/mindiellm/llmdev/mindie_llm0012.html
單機推理 – 配置 MindIE Server – 配置 MindIE-MindIE 安裝指南 – 環境準備 – MindIE1.0.0 開發文件 – 昇騰社群
https://www.hiascend.com/document/detail/zh/mindie/100/envdeployment/instg/mindie_instg_0026.html
"ipAddress" : "192.168.1.115", 改為本地地址"httpsEnabled" : false,"npuDeviceIds" : [[0,1]],"modelName" : "DeepSeek-R1-Distill-Qwen-32B-W8A8","modelWeightPath" : "/home/HwHiAiUser/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B-W8A8/","maxInputTokenLen" : 4096,"maxIterTimes" : 4096,"truncation" : true,
修改內容如下
(Python310) root@dify:/usr/local/Ascend/mindie/latest/mindie-service# cat conf/config.json{"Version" : "1.0.0","LogConfig" :{"logLevel" : "Info","logFileSize" : 20,"logFileNum" : 20,"logPath" : "logs/mindie-server.log"},"ServerConfig" :{"ipAddress" : "192.168.1.115","managementIpAddress" : "127.0.0.2","port" : 1025,"managementPort" : 1026,"metricsPort" : 1027,"allowAllZeroIpListening" : false,"maxLinkNum" : 1000,"httpsEnabled" : false,"fullTextEnabled" : false,"tlsCaPath" : "security/ca/","tlsCaFile" : ["ca.pem"],"tlsCert" : "security/certs/server.pem","tlsPk" : "security/keys/server.key.pem","tlsPkPwd" : "security/pass/key_pwd.txt","tlsCrlPath" : "security/certs/","tlsCrlFiles" : ["server_crl.pem"],"managementTlsCaFile" : ["management_ca.pem"],"managementTlsCert" : "security/certs/management/server.pem","managementTlsPk" : "security/keys/management/server.key.pem","managementTlsPkPwd" : "security/pass/management/key_pwd.txt","managementTlsCrlPath" : "security/management/certs/","managementTlsCrlFiles" : ["server_crl.pem"],"kmcKsfMaster" : "tools/pmt/master/ksfa","kmcKsfStandby" : "tools/pmt/standby/ksfb","inferMode" : "standard","interCommTLSEnabled" : true,"interCommPort" : 1121,"interCommTlsCaPath" : "security/grpc/ca/","interCommTlsCaFiles" : ["ca.pem"],"interCommTlsCert" : "security/grpc/certs/server.pem","interCommPk" : "security/grpc/keys/server.key.pem","interCommPkPwd" : "security/grpc/pass/key_pwd.txt","interCommTlsCrlPath" : "security/grpc/certs/","interCommTlsCrlFiles" : ["server_crl.pem"],"openAiSupport" : "vllm"},"BackendConfig" : {"backendName" : "mindieservice_llm_engine","modelInstanceNumber" : 1,"npuDeviceIds" : [[0,1]],"tokenizerProcessNumber" : 8,"multiNodesInferEnabled" : false,"multiNodesInferPort" : 1120,"interNodeTLSEnabled" : true,"interNodeTlsCaPath" : "security/grpc/ca/","interNodeTlsCaFiles" : ["ca.pem"],"interNodeTlsCert" : "security/grpc/certs/server.pem","interNodeTlsPk" : "security/grpc/keys/server.key.pem","interNodeTlsPkPwd" : "security/grpc/pass/mindie_server_key_pwd.txt","interNodeTlsCrlPath" : "security/grpc/certs/","interNodeTlsCrlFiles" : ["server_crl.pem"],"interNodeKmcKsfMaster" : "tools/pmt/master/ksfa","interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb","ModelDeployConfig" :{"maxSeqLen" : 2560,"maxInputTokenLen" : 4096,"truncation" : true,"ModelConfig" : [{"modelInstanceType" : "Standard","modelName" : "DeepSeek-R1-Distill-Qwen-32B-W8A8","modelWeightPath" : "/home/HwHiAiUser/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B-W8A8/","worldSize" : 2,"cpuMemSize" : 5,"npuMemSize" : -1,"backendType" : "atb","trustRemoteCode" : false}]},"ScheduleConfig" :{"templateType" : "Standard","templateName" : "Standard_LLM","cacheBlockSize" : 128,"maxPrefillBatchSize" : 50,"maxPrefillTokens" : 8192,"prefillTimeMsPerReq" : 150,"prefillPolicyType" : 0,"decodeTimeMsPerReq" : 50,"decodePolicyType" : 0,"maxBatchSize" : 200,"maxIterTimes" : 4096,"maxPreemptCount" : 0,"supportSelectBatch" : false,"maxQueueDelayMicroseconds" : 5000}}}
修改模型許可權,啟動服務
(Python310) root@dify:/usr/local/Ascend/mindie/latest/mindie-service# chmod -R 750 /home/HwHiAiUser/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B-W8A8/(Python310) root@dify:/usr/local/Ascend/mindie/latest/mindie-service# ./bin/mindieservice_daemonSpecial tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.[2025-02-23 19:04:44,279] [89160] [281464373506464] [llm] [INFO][logging.py-227] : Skip binding cpu.Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.Daemon start success!
重啟一個終端,檢視 npu 使用狀況

本機測試[root@dify ~]# curl -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"inputs":"如何賺大錢","parameters":{"decoder_input_details":true,"details":true,"do_sample":true,"max_new_tokens":50,"repetition_penalty":1.03,"return_full_text":false,"seed":null,"temperature":0.5,"top_k":10,"top_p":0.95,"truncate":null,"typical_p":0.5,"watermark":false}}'http://192.168.1.115:1025/generate{"details":{"prompt_tokens":5,"finish_reason":"length","generated_tokens":50,"prefill":[{"id":151646,"logprob":null,"special":null,"text":null},{"id":100007,"logprob":null,"special":null,"text":null},{"id":102223,"logprob":null,"special":null,"text":null},{"id":26288,"logprob":null,"special":null,"text":null},{"id":99428,"logprob":null,"special":null,"text":null}],"seed":2240260787,"tokens":[{"id":26850,"logprob":null,"special":null,"text":null},{"id":100007,"logprob":null,"special":null,"text":null},{"id":102223,"logprob":null,"special":null,"text":null},{"id":26288,"logprob":null,"special":null,"text":null},{"id":99428,"logprob":null,"special":null,"text":null},{"id":11319,"logprob":null,"special":null,"text":null},{"id":1406,"logprob":null,"special":null,"text":null},{"id":151649,"logprob":null,"special":null,"text":null},{"id":271,"logprob":null,"special":null,"text":null},{"id":102223,"logprob":null,"special":null,"text":null},{"id":26288,"logprob":null,"special":null,"text":null},{"id":99428,"logprob":null,"special":null,"text":null},{"id":102119,"logprob":null,"special":null,"text":null},{"id":85106,"logprob":null,"special":null,"text":null},{"id":100374,"logprob":null,"special":null,"text":null},{"id":99605,"logprob":null,"special":null,"text":null},{"id":9370,"logprob":null,"special":null,"text":null},{"id":101139,"logprob":null,"special":null,"text":null},{"id":5373,"logprob":null,"special":null,"text":null},{"id":85329,"logprob":null,"special":null,"text":null},{"id":33108,"logprob":null,"special":null,"text":null},{"id":99345,"logprob":null,"special":null,"text":null},{"id":101135,"logprob":null,"special":null,"text":null},{"id":1773,"logprob":null,"special":null,"text":null},{"id":87752,"logprob":null,"special":null,"text":null},{"id":99639,"logprob":null,"special":null,"text":null},{"id":97084,"logprob":null,"special":null,"text":null},{"id":102716,"logprob":null,"special":null,"text":null},{"id":39907,"logprob":null,"special":null,"text":null},{"id":48443,"logprob":null,"special":null,"text":null},{"id":14374,"logprob":null,"special":null,"text":null},{"id":220,"logprob":null,"special":null,"text":null},{"id":16,"logprob":null,"special":null,"text":null},{"id":13,"logprob":null,"special":null,"text":null},{"id":3070,"logprob":null,"special":null,"text":null},{"id":99716,"logprob":null,"special":null,"text":null},{"id":102447,"logprob":null,"special":null,"text":null},{"id":1019,"logprob":null,"special":null,"text":null},{"id":256,"logprob":null,"special":null,"text":null},{"id":481,"logprob":null,"special":null,"text":null},{"id":3070,"logprob":null,"special":null,"text":null},{"id":104023,"logprob":null,"special":null,"text":null},{"id":5373,"logprob":null,"special":null,"text":null},{"id":100025,"logprob":null,"special":null,"text":null},{"id":334,"logprob":null,"special":null,"text":null},{"id":5122,"logprob":null,"special":null,"text":null},{"id":67338,"logprob":null,"special":null,"text":null},{"id":101930,"logprob":null,"special":null,"text":null},{"id":99716,"logprob":null,"special":null,"text":null},{"id":101172,"logprob":null,"special":null,"text":null}]},"generated_text":"?\n\n 如何賺大錢?\n\n\n</think>\n\n 賺大錢通常需要結合個人的技能、資源和市場機會。以下是一些常見的方法:\n\n### 1. ** 投資理財 **\n – ** 股票、基金 **:透過長期投資優質"}[root@dify ~]#
三、啟動分詞服務和重排序服務,首先去昇騰倉下載映象 昇騰映象倉庫詳情(https://www.hiascend.com/developer/ascendhub/detail/mis-tei), 對應自己的裝置查詢映象

(1) 拉取映象 Atlas 800 9000,已經要根據自己的硬體版本去官方倉拉取映象,進行分詞服務啟動
[root@dify ~]# docker pull swr.cn-east-317.qdrgznjszx.com/sxj731533730/mis-tei:6.0.RC3-910-aarch64[root@dify ~]# docker imagesREPOSITORY TAG IMAGE ID CREATED SIZEswr.cn-east-317.qdrgznjszx.com/sxj731533730/mis-tei 6.0.RC3-910-aarch64 affece68b209 2 days ago 22.6GBswr.cn-east-317.qdrgznjszx.com/sxj731533730/mindie atlas_800_9000 69f30d0c15be 5 weeks ago 16.5GB[root@dify ~]#
拉取完映象之後,進行必要的權重模型下載
[root@dify ~]# cd /home/HwHiAiUser/[root@dify HwHiAiUser]# pwd/home/HwHiAiUser[root@dify HwHiAiUser]# vim down.py[root@dify HwHiAiUser]# cat down.py#模型下載from modelscope import snapshot_downloadmodel_dir = snapshot_download('BAAI/bge-m3',cache_dir=".")from modelscope import snapshot_downloadmodel_dir = snapshot_download('BAAI/bge-large-zh-v1.5',cache_dir=".")from modelscope import snapshot_downloadmodel_dir = snapshot_download('BAAI/bge-reranker-large',cache_dir=".")[root@dify HwHiAiUser]# python3 down.py
下載完模型,修改每一個模型內部的配置項 Atlas800 9000/300I Duo/300V Pro 裝置,Atlas 800T A2 等裝置不用走該步驟
[root@dify HwHiAiUser]# lsAscend-hdk-910-npu-driver_23.0.0_linux-aarch64.run BAAI deepseek-ai docker_run.sh down.py msit[root@dify HwHiAiUser]# vim BAAI/bge-large-zh-v1___5/config.json[root@dify HwHiAiUser]# vim BAAI/bge-m3/config.json[root@dify HwHiAiUser]# vim BAAI/bge-reranker-large/config.json"torch_dtype": "float16",
(2)建立三個容器,暫定容器名字是bge-m3、bge-large-zh-v1___5、bge-reranker-large, 在建立之前,需要聯絡昇騰技術人員,開通伺服器對外埠,暫定開通的為 8001,8002,8003 和 niginx 轉發埠 – 入方向:| 出方向:TCP/8001,8002,8003,8004,442

將模型複製到 /home/data 下,參考官方手冊來即可
[root@dify ~]# cd /home/HwHiAiUser/[root@dify HwHiAiUser]# lsAscend-hdk-910-npu-driver_23.0.0_linux-aarch64.run BAAI deepseek-ai docker_run.sh down.py msit[root@dify HwHiAiUser]# pwd/home/HwHiAiUser[root@dify HwHiAiUser]# mkdir -p /home/data[root@dify HwHiAiUser]# cp -r BAAI/* /home/data/[root@dify HwHiAiUser]# ls /home/data/bge-large-zh-v1___5 bge-m3 bge-reranker-large[root@dify HwHiAiUser]#
參考官方說明:
ASCEND_VISIBLE_DEVICES 環境變量表示將宿主機上的 npu 卡掛載到容器,如果掛載多張卡使用逗號分隔,如:ASCEND_VISIBLE_DEVICES=0,1,2,3;掛載多張卡到容器時,預設會尋找最優的一張卡呼叫,如果不希望容器內部自動尋找最優的卡,啟動容器時可透過 TEI_NPU_DEVICE = 卡 id 指定使用哪張卡,注意這裡的變數 TEI_NPU_DEVICE 配置從 0 開始取,容器內已將外部卡 id 進行了邏輯對映,編號從 0 連續對映;注意:配置的 ASCEND_VISIBLE_DEVICES 對應的卡不能被其他容器已掛載,否則會報錯
[root@dify ~]# docker run -u root -e TEI_NPU_DEVICE=0 -itd –name=bge-reranker-large –net=host -e HOME=/home/HwHiAiUser –privileged=true -v /home/data:/home/HwHiAiUser/model -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/local/Ascend/driver:/usr/local/Ascend/driver –entrypoint /home/HwHiAiUser/start.sh swr.cn-east-317.qdrgznjszx.com/sxj731533730/mis-tei:6.0.RC3-910-aarch64 BAAI/bge-reranker-large 192.168.1.115 8001ef2383785c58ec5a650eb9d852ba965c48eb7b8cc7679cb7c194d2f2d0eb1a0d[root@dify ~]# docker start ef2383785c58ec5a650eb9d852ba965c48eb7b8cc7679cb7c194d2f2d0eb1a0def2383785c58ec5a650eb9d852ba965c48eb7b8cc7679cb7c194d2f2d0eb1a0d[root@dify ~]# docker run -u root -e TEI_NPU_DEVICE=1 -itd –name=bge-m3 –net=host -e HOME=/home/HwHiAiUser –privileged=true -v /home/data:/home/HwHiAiUser/model -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/local/Ascend/driver:/usr/local/Ascend/driver –entrypoint /home/HwHiAiUser/start.sh swr.cn-east-317.qdrgznjszx.com/sxj731533730/mis-tei:6.0.RC3-910-aarch64 BAAI/bge-m3 192.168.1.115 800250dd3573f1ae1363211791425a2f681445b220f5a45bbdbe572a361ce974f63a[root@dify ~]# docker start 50dd3573f1ae1363211791425a2f681445b220f5a45bbdbe572a361ce974f63a50dd3573f1ae1363211791425a2f681445b220f5a45bbdbe572a361ce974f63abge-large-zh-v1___5 bge-m3 bge-reranker-large[root@dify ~]# docker run -u root -e TEI_NPU_DEVICE=2 -itd –name=bge-large-zh-v1___5 –net=host -e HOME=/home/HwHiAiUser –privileged=true -v /home/data:/home/HwHiAiUser/model -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/local/Ascend/driver:/usr/local/Ascend/driver –entrypoint /home/HwHiAiUser/start.sh swr.cn-east-317.qdrgznjszx.com/sxj731533730/mis-tei:6.0.RC3-910-aarch64 BAAI/bge-large-zh-v1___5 192.168.1.115 8003d360f2b558c6556af53e19abd9f0782600f8cab1a7c60dc90fcf0b6061511c96[root@dify ~]# docker start d360f2b558c6556af53e19abd9f0782600f8cab1a7c60dc90fcf0b6061511c96d360f2b558c6556af53e19abd9f0782600f8cab1a7c60dc90fcf0b6061511c96
檢視一下三個服務,兩個分詞,一個排序模型,當然也可以放在一個 NPU 上執行編輯

記錄一下對外的服務埠 mindie 推理服務 192.168.1.115:1025 ;bge-reranker-large 服務:192.168.1.115:8001 bge-m3 服務:192.168.1.115:8002 bge-large-zh-v1___5 服務: 192.168.1.115:8003
四、部署 dify 環境進行部署配置,部署遇到的最大問題就是昇騰架構使用的 aarch64,gitee 使用 docker 映象容器是 x86_64, 所以找映象替代即可
(1)拉取 dify 的原始碼
[root@dify HwHiAiUser]# git clone https://gitee.com/dify_ai/dify.gitCloning into 'dify'…remote: Enumerating objects: 206836, done.remote: Counting objects: 100% (10350/10350), done.remote: Compressing objects: 100% (5418/5418), done.remote: Total 206836 (delta 6559), reused 7867 (delta 4637), pack-reused 196486Receiving objects: 100% (206836/206836), 80.47 MiB | 3.03 MiB/s, done.Resolving deltas: 100% (161147/161147), done.[root@dify HwHiAiUser]# cd dify[root@dify dify]# git checkout 0.15.3Note: checking out '0.15.3'.You are in 'detached HEAD' state. You can look around, make experimentalchanges and commit them, and you can discard any commits you make in thisstate without impacting any branches by performing another checkout.If you want to create a new branch to retain commits you create, you maydo so (now or later) by using -b with the checkout command again. Example:git checkout -b <new-branch-name>HEAD is now at ca19bd31d chore(*): Bump version to 0.15.3 (#13308)[root@dify HwHiAiUser]# cd docker/[root@dify docker]# cp .env.example .env[root@dify docker]# vim .env修改 848 行、906 行NGINX_PORT=80# SSL settings are only applied when HTTPS_ENABLED is trueNGINX_SSL_PORT=443修改NGINX_PORT=8004# SSL settings are only applied when HTTPS_ENABLED is trueNGINX_SSL_PORT=442另一處EXPOSE_NGINX_PORT=80EXPOSE_NGINX_SSL_PORT=443修改EXPOSE_NGINX_PORT=8004EXPOSE_NGINX_SSL_PORT=442
修改配置檔案
[root@dify docker]# vim docker-compose.yaml
第 486 行新增 –ignore-warnings ARM64-COW-BUG

將 492 行 修改 0.2.10 修改為 0.2.1

(2)下載 docker-compose,配置工具
sudo curl -Lhttps://github.com/docker/compose/releases/download/v2.33.0/docker-compose-linux-aarch64-o /usr/local/bin/docker-compose或者這樣下載[root@dify docker]# cd /usr/local/bin/[root@dify bin]# pwd/usr/local/bin[root@dify bin]# wget https://sxj731533730.obs.cn-east-317.qdrgznjszx.com/docker-compose–2025-02-25 21:07:54–https://sxj731533730.obs.cn-east-317.qdrgznjszx.com/docker-composeResolving sxj731533730.obs.cn-east-317.qdrgznjszx.com (sxj731533730.obs.cn-east-317.qdrgznjszx.com)… 100.125.32.125Connecting to sxj731533730.obs.cn-east-317.qdrgznjszx.com (sxj731533730.obs.cn-east-317.qdrgznjszx.com)|100.125.32.125|:443… connected.HTTP request sent, awaiting response… 200 OKLength: 71778465 (68M) [application/octet-stream]Saving to: ‘docker-compose’docker-compose 100%[=====================================================================>] 68.45M 220MB/s in 0.3s2025-02-25 21:07:54 (220 MB/s) – ‘docker-compose’ saved [71778465/71778465][root@difybin]# lscloud-id cloud-init-per jsondiff jsonpointer modelscope npu-healthcheck.sh tqdmcloud-init docker-compose jsonpatch jsonschema normalizer npu-smi[root@difybin]# chmod 777 docker-compose[root@difybin]# docker-compose -vDocker Compose version v2.33.0
(3)拉取映象,準備啟動 dify 環境,根據。yaml 找 aarch64 位庫即可
docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/langgenius/dify-api:0.15.3-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/langgenius/dify-api:0.15.3-linuxarm64 docker.io/langgenius/dify-api:0.15.3docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/langgenius/dify-web:0.15.3-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/langgenius/dify-web:0.15.3-linuxarm64 docker.io/langgenius/dify-web:0.15.3docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/postgres:15-alpine-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/postgres:15-alpine-linuxarm64 docker.io/postgres:15-alpinedocker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/redis:6-alpine-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/redis:6-alpine-linuxarm64 docker.io/redis:6-alpinedocker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/langgenius/dify-sandbox:0.2.10-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/langgenius/dify-sandbox:0.2.10-linuxarm64 docker.io/langgenius/dify-sandbox:0.2.10docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/langgenius/dify-sandbox:0.2.1-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/langgenius/dify-sandbox:0.2.1-linuxarm64 docker.io/langgenius/dify-sandbox:0.2.1docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/ubuntu/squid:latest-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/ubuntu/squid:latest-linuxarm64 docker.io/ubuntu/squid:latestdocker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/certbot/certbot:v3.1.0-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/certbot/certbot:v3.1.0-linuxarm64 docker.io/certbot/certbot:latestdocker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/nginx:latest-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/nginx:latest-linuxarm64 docker.io/nginx:latestdocker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/pingcap/tidb:v8.4.0-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/pingcap/tidb:v8.4.0-linuxarm64 docker.io/pingcap/tidb:v8.4.0docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/semitechnologies/weaviate:1.19.0-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/semitechnologies/weaviate:1.19.0-linuxarm64 docker.io/semitechnologies/weaviate:1.19.0docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/langgenius/qdrant:v1.7.3-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/langgenius/qdrant:v1.7.3-linuxarm64 docker.io/langgenius/qdrant:v1.7.3docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/pgvector/pgvector:pg16-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/pgvector/pgvector:pg16-linuxarm64 docker.io/pgvector/pgvector:pg16docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/tensorchord/pgvecto-rs:pg16-v0.3.0-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/tensorchord/pgvecto-rs:pg16-v0.3.0-linuxarm64 docker.io/tensorchord/pgvecto-rs:pg16-v0.3.0docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/ghcr.io/chroma-core/chroma:0.5.20-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/ghcr.io/chroma-core/chroma:0.5.20-linuxarm64 ghcr.io/chroma-core/chroma:0.5.20docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/quay.io/oceanbase/oceanbase-ce:4.3.3.0-100000142024101215-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/quay.io/oceanbase/oceanbase-ce:4.3.3.0-100000142024101215-linuxarm64 quay.io/oceanbase/oceanbase-ce:4.3.3.0-100000142024101215docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/container-registry.oracle.com/database/free:latest-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/container-registry.oracle.com/database/free:latest-linuxarm64 docker.io/container-registry.oracle.com/database/free:latestdocker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/quay.io/coreos/etcd:v3.5.5-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/quay.io/coreos/etcd:v3.5.5-linuxarm64 quay.io/coreos/etcd:v3.5.5docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/minio/minio:RELEASE.2023-03-20T20-16-18Z-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/minio/minio:RELEASE.2023-03-20T20-16-18Z-linuxarm64 docker.io/minio/minio:RELEASE.2023-03-20T20-16-18Zdocker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/milvusdb/milvus:v2.5.0-beta-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/milvusdb/milvus:v2.5.0-beta-linuxarm64 docker.io/milvusdb/milvus:v2.5.0-betadocker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/opensearchproject/opensearch:latest-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/opensearchproject/opensearch:latest-linuxarm64 docker.io/opensearchproject/opensearch:latestdocker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/opensearchproject/opensearch-dashboards:latest-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/opensearchproject/opensearch-dashboards:latest-linuxarm64 docker.io/opensearchproject/opensearch-dashboards:latestdocker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/myscale/myscaledb:1.6.4-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/myscale/myscaledb:1.6.4-linuxarm64 docker.io/myscale/myscaledb:1.6.4docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.elastic.co/elasticsearch/elasticsearch:8.14.3-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.elastic.co/elasticsearch/elasticsearch:8.14.3-linuxarm64 docker.elastic.co/elasticsearch/elasticsearch:8.14.3docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.elastic.co/kibana/kibana:8.14.3-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.elastic.co/kibana/kibana:8.14.3-linuxarm64 docker.elastic.co/kibana/kibana:8.14.3docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/robwilkes/unstructured-api:latest-linuxarm64docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/robwilkes/unstructured-api:latest-linuxarm64 docker.io/robwilkes/unstructured-api:latest
然後啟動 dify 成功
[root@dify HwHiAiUser]# cd dify/
[root@dify dify]# cd docker
[root@dify docker]# pwd
/home/HwHiAiUser/dify/docker
[root@dify docker]# docker-compose up -d
[+] Running 11/11
✔ Network docker_default Created
✔ Network docker_ssrf_proxy_network Created
✔ Container docker-sandbox-1 Started
✔ Container docker-redis-1 Started
✔ Container docker-web-1 Started
✔ Container docker-weaviate-1 Started
✔ Container docker-db-1 Started 1.
✔ Container docker-ssrf_proxy-1 Started
✔ Container docker-api-1 Started
✔ Container docker-worker-1 Started
✔ Container docker-nginx-1 Started
[root@dify docker]#
後臺啟動成功

五、啟動 dify 進行配置介面,在位址列輸入 http://ip(訪問伺服器的 ip 地址):8084 埠,可以刷新出 dify 介面

註冊一下,這個是所有者許可權,只能註冊一次,無法修改,如果修改,需要重新拉 dify 服務

使用所有者許可權進入賬戶,點選右邊的設定

選擇模型供應商

在下面的列表中找到這兩個配置項

新增第一個模型 deepseek

OpenAI-API-compatible
型別選 LLM 模型名字對應你的 mindie 的 name:DeepSeek-R1-Distill-Qwen-32B-W8A8 mindie 的 URL:http://192.168.1.115:1025/v1只要後臺服務啟動中,前端可以儲存,就是 ok,秘鑰隨意填

Text Embedding Inference
然後配置排序模型和分詞模型,支援 RAG, 秘鑰隨便寫,只要後臺服務啟動中,前端可以儲存,就是 ok
1.1 選擇 RERANK URL 設定
http://192.168.1.115:8001/
模型名 :bge-reranker-large
1.2 選擇 TEXT EMBEDDING URL 設定
http://192.168.1.115:8002/
模型名 :bge-large-zh-v1___5
1.3 選擇 TEXT EMBEDDING URL 設定
http://192.168.1.115:8003/
模型名 :bge-m3
六、實際測試,跑在昇騰上面的 DeepSeek-R1-Distill-Qwen-32B-W8A8 雙卡

測試知識庫 RAG, 看一下知識庫的內容

開始處理文字


測試不掛知識庫結果

測試掛知識庫結果

郵箱分發功能,需要修改原始碼 ,修改原始碼,從郵箱拿到秘鑰,重啟服務
[root@wuzhoutuili-0003 docker]# vim ../api/tasks/mail_invite_member_task.py[root@wuzhoutuili-0003 docker]# pwd/home/HwHiAiUser/dify/docker




邀約郵件

埋個彩蛋,敬請期待 昇騰伺服器部署 one-api+fastgpt, 內測中
