從零開始的DeepSeek微調訓練實戰（SFT）

前言

本文重點介紹使用微調框架unsloth，圍繞DeepSeek R1 Distill 7B模型進行高效微調，並介紹用於推理大模型高效微調的COT資料集的建立和使用方法，並在一個medical-o1-reasoning-SFT資料集上完成高效微調實戰，並最終達到問答風格最佳化&知識灌注目的。

你能收穫什麼：

親手完成DeepSeek R1蒸餾模型的微調實戰

對模型微調、推理資料集等知識有一定了解

對大模型執行的機制和原理有一定的瞭解

有機會製作一個屬於自己的定製化大模型

復現僅需7G視訊記憶體、半小時執行時間即可完成一次最小可行性實驗，並獲得微調效果。本人使用的GPU資訊如下所示，使用平臺為雲開發平臺ADC：

一、基礎概念介紹

1.1 微調與強化學習、模型蒸餾

伴隨著DeepSeek的興起，關於強化學習訓練、模型蒸餾等概念也逐漸被人熟知，這裡簡單總結下這三者的異同。微調、強化學習訓練和模型蒸餾都是常用的技術方法，儘管這些方法在某些方面存在交集，但它們的核心原理和任務目標卻有顯著差異。

1. 微調（Fine-tuning）：

微調是指在已經訓練好的大型預訓練模型的基礎上，進一步訓練該模型以適應特定任務或特定領域的資料。相比從零開始訓練一個模型，微調所需的資料和計算資源顯著減少；可以在特定任務上取得更好的效能，因為模型在微調過程中會重點學習與任務相關的特性；可以在多種領域（如情感分析、問答系統等）上進行微調，從而快速適應不同應用場景。

舉個🌰：想象一下，你有一隻機器人狗，它已經在基本的狗行為上進行了初步訓練，比如行走和聽從簡單的命令。微調就像是對這隻機器狗進行進一步的訓練以適應特定的任務環境。比如說，你希望這隻機器狗能夠在公園裡撿回特定種類的球。透過微調，你可以在原有的訓練基礎上，用一組特定的資料集（比如各種顏色和大小的球）來調整其行為，使其在新環境中表現得更好。

●目標：透過少量的標註資料對預訓練模型進行最佳化，適應具體任務。

●特點：微調的計算量相對較小，能夠在有限的資料和計算資源下提升模型在特定任務上的效能。

●應用：常用於下游任務如情感分析、機器翻譯、推薦系統等。

2. 強化學習（Reinforcement Learning）：

強化學習是一種機器學習方法，它透過讓智慧體在環境中執行動作，以獲得反饋或獎勵訊號，從而學習最優策略。透過不斷地試錯和調整策略，智慧體逐漸找到能夠最大化長期回報的行為路徑。這種學習方法常用於需要決策和動態環境互動的任務，如遊戲、機器人導航和自動化控制系統。

舉個🌰：強化學習訓練則有點像是教這隻機器狗透過嘗試和錯誤來學習新技能。在這種情況下，你沒有直接告訴它應該怎麼做，而是為它設定一個目標，比如儘可能快地找到並撿起一隻球。機器狗每完成一次任務都會獲得獎勵，然後它將透過調整自己的行為來最大化獲得的獎勵。例如，如果機器狗發現跑直線能更快地找到球，它可能會在未來的嘗試中更傾向於這樣做。

●目標：透過與環境的互動，學習最優的行為策略，最大化累積獎勵。

●特點：強化學習強調動態決策，它通常不依賴於預定義的資料集，而是依賴於與環境的持續互動。

●應用：強化學習在遊戲AI（如AlphaGo）、機器人控制、自動駕駛等任務中有廣泛應用。

3. 模型蒸餾（Model Distillation）：

模型蒸餾是一種模型壓縮技術，透過將一個複雜的大型模型（通常稱為“教師模型”）中的知識遷移到一個更小的模型（稱為“學生模型”）。在這個過程中，教師模型首先對訓練資料進行預測，生成軟標籤即機率分佈。這些軟標籤包含了有關任務的重要資訊。學生模型則使用這些軟標籤進行訓練，以接近教師模型的效能。模型蒸餾能夠在保持高精度的同時，顯著減少模型的大小和計算消耗，適用於在資源受限的環境下部署機器學習模型。

舉個🌰：你有一隻非常昂貴和精密的機器人狗，它可以完美執行任務。為了降低成本，你希望製造一個更簡單的機器狗，同樣能有效完成任務。透過模型蒸餾，你會使用大狗的行為資料來訓練小狗，讓後者理解和模仿前者的精妙動作，同時保持高效性。

●目標：透過教師模型的“知識轉移” ，幫助學生模型提升效能，特別是計算能力有限的裝置上。

●特點：蒸餾的核心在於知識的遷移，尤其是在模型壓縮和部署方面的優勢。學生模型通常在效能上能接近教師模型，但引數量更小，計算更高效。

●應用：常見於模型壓縮、邊緣計算、低功耗裝置的部署中，用於提升部署效率並降低計算需求。

1.2 大模型微調

與RAG（Retrieval-Augmented Generation）或Agent技術依靠構建複雜的工作流以最佳化模型效能不同，微調透過直接調整模型的引數來提升模型的能力。這種方法讓模型透過在特定任務的資料上進行再訓練，從而'永久'掌握該任務所需的技能。微調不僅可以顯著提高模型在特定領域或任務上的表現，還能使其適應於各種具體應用場景的需求。這種能力的增強是透過更精細地調整模型內部的權重和偏差，使其在理解和生成資訊時更加精準，因此被廣泛用於需要高精度和領域適應性的任務中。

1.2.1 全量微調與高效微調

從廣義上講，微調可以分為兩種主要方式：全量微調和高效微調。全量微調是指利用所有可用資料來重新訓練模型，以全面最佳化其引數。儘管這種方法對計算資源的需求較高，但它能夠在最大程度上提升模型對特定任務的適應能力。相反，高效微調則採用更精簡的策略，只使用部分資料進行調整，並主要修改模型的部分引數。這種方法以相對較低的計算開銷，實現對模型效能的顯著提升，類似於“以小博大”，非常適合在資源有限的情況下快速調整和增強模型的效能。

全量微調（Full Fine-Tuning）

舉個🌰：想象一下你在一家公司管理一個團隊，這個團隊的所有成員已經接受了基礎培訓，知道如何處理一般的工作任務。現在，公司引入了一個全新的複雜專案，要求團隊具備更多的專業技能和知識。

●優點：全面掌握所有相關技能，使模型對新任務有更高的適應性。

●缺點：耗時更長，資源消耗大。

高效微調（Efficient Fine-Tuning）

高效微調的方法更有針對性，它不需要花費大量的時間和資源。舉個🌰：比如，如果機器人狗的任務只是要學會在一種新環境中識別特別的障礙物，你可以在已有的模型基礎上，僅僅微調那些與識別相關的引數，而無需重新訓練整個模型。

●優點：節省時間和資源，快速提升特定技能。

●缺點：可能不如全面培訓那樣細緻和徹底，但能夠在特定任務中高效達標。

現在絕大多數開源模型，在開源的時候都會公佈兩個版本的模型，其一是Base模型，該模型只經過了預訓練，沒有經過指令微調；其二則是Chat模型（或者就是不帶尾綴的模型），則是在預訓練模型基礎上進一步進行全量指令微調之後的對話模型：

1.2.2 高效微調與LoRA、 QLoRA

儘管全量微調可以對模型的能力進行深度改造，但要帶入模型全部引數進行訓練，需要消耗大量的算力，且有一定的技術門檻。相比之下，在絕大多數場景中，如果我們只想提升模型某個具體領域的能力，那高效微調會更加合適。儘管在2020年前後，深度學習領域誕生了很多高效微調的方法，但現在適用於大模型的最主流的高效微調方法只有一種——LoRA。

LoRA（ Low-Rank Adaptation）微調是一種引數高效的微調方法，旨在透過引入低秩矩陣來減少微調時需要調整的引數數量，從而顯著降低視訊記憶體和計算資源的消耗。具體來說，LoRA 微調並不直接調整原始模型的所有引數，而是透過在某些層中插入低秩的介面卡（Adapter）層來進行訓練。

LoRA的原理：

●在標準微調中，會修改模型的所有權重，而在 LoRA 中，只有某些低秩矩陣（介面卡）被訓練和調整。這意味著原始模型的引數保持不變，只是透過少量的新引數來調整模型的輸出。

●低秩矩陣的引入可以在視訊記憶體和計算能力有限的情況下，依然有效地對大型預訓練模型進行微調，從而讓 LoRA 成為視訊記憶體較小的裝置上的理想選擇。

舉個🌰：想象你想教學生們怎樣進行快速心算而不去完全打破他們原有的學習方法。你決定只引入一個簡化版本的心算技巧，讓他們在現有知識的基礎上進行少量調整。這就像是把原有的學習方式輕量化處理，只增加所需的少量新知識，而不是重新教授整個數學課程。

LoRA的優勢：

1.視訊記憶體最佳化：只需要調整少量的引數（介面卡），顯著減少了視訊記憶體需求，適合視訊記憶體有限的GPU。

2.計算效率：微調過程中的計算負擔也更輕，因為減少了需要調整的引數量。

3.靈活性：可以與現有的預訓練模型輕鬆結合使用，適用於多種任務，如文字生成、分類、問答等。

而QLoRA（Quantized Low-Rank Adaptation）則是 LoRA 的一個擴充套件版本，它結合了 LoRA 的低秩介面卡和量化技術。QLoRA 進一步優化了計算效率和儲存需求，特別是在極端視訊記憶體受限的環境下。與 LoRA 不同的是， QLoRA 會將插入的低秩介面卡層的部分權重進行量化（通常是量化為INT4或INT8），在保持效能的同時顯著降低模型的儲存和計算需求。

舉個🌰：針對學生中一些學習資源（如時間或精力）更加有限的情況，你進一步最佳化教學方法，不僅簡化了學習內容（類似LoRA），同時還用了一些有助於記憶的技巧（比如使用影像或口訣），從而更有效地傳授知識。這樣，每個學生能在有限時間內學會心演算法。在技術上，QLoRA涉及量化（quantization）技術，將模型的一部分權重引數儲存在較低精度的數值格式中，以此減少記憶體使用和計算量，同時結合LoRA的低秩調整，讓適應過程更加高效。

QLoRA的優勢：

1.在視訊記憶體非常有限的情況下仍能進行微調。

2.可以處理更大規模的模型。

3.適合用於邊緣裝置和需要低延遲推理的場景。

1.3 高效微調的應用場景

在實際大模型應用場景中，高效微調主要用於以下四個方面：

1.對話風格微調：高效微調可以用於根據特定需求調整模型的對話風格。例如，針對客服系統、虛擬助理等場景，模型可以透過微調來適應不同的語氣、禮貌程度或回答方式，從而在與使用者互動時提供更符合要求的對話體驗。透過微調少量的引數（例如對話生成的策略、情感表達等），可以使模型表現出更具針對性和個性化的風格。

2.知識灌注：知識灌注是指將外部知識或領域特定的資訊快速整合到已有的預訓練模型中。透過高效微調，模型可以更好地學習新領域的專有知識，而無需重新從頭開始訓練。例如，對於法律、醫療等專業領域，可以使用少量的標註資料對預訓練模型進行微調，幫助模型理解特定行業的術語、規則和知識，進而提升專業領域的問答能力。

3.推理能力提升：高效微調還可以用於提升大模型的推理能力，尤其是在處理更復雜推理任務時。透過微調，模型能夠更加高效地理解長文字、推理隱含資訊，或者從資料中提取邏輯關係，進而在多輪推理任務中提供更準確的答案。這種微調方式可以幫助模型在解答覆雜問題時，提高推理準確性並減少錯誤。

4.Agent能力（Function calling & MCP能力）提升：在多工協作或功能呼叫場景中，高效微調能夠顯著提升模型Agent能力，使得模型能夠有效地與其他系統進行互動、呼叫外部API或執行特定MCP任務。透過針對性微調，模型可以學會更精準的功能呼叫策略、引數解析和操作指令，從而在自動化服務、智慧助手或機器人控制等領域表現得更加高效和智慧。

二、 DeepSeek R1 Distill高效微調環境準備

2.1 unsloth安裝

unsloth是推理、微調一體式框架，unsloth將Llama 3.3、Mistral、Phi-4、Qwen 2.5和Gemma的微調速度提高2倍，同時節省80%的記憶體。

官網地址：GitHub – unslothai/unsloth: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory

https://github.com/unslothai/unsloth

使用如下命令快速安裝：

pip install unslothpip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.gi

2.2 wandb安裝與註冊

2.2.1 wandb基本說明

在大規模模型訓練中，往往需要監控和分析大量的訓練資料，而WandB可以幫助實現這一目標。它提供了以下幾個重要的功能（注意資料安全問題，wandb內網穿透⚠️）：

2.2.2 wandb註冊與使用

wandb官網：https://wandb.ai/site

使用郵箱註冊後，記得複製下APIkey

然後即可在令行中輸入如下程式碼安裝wandb:

pip install wandb

可設定wandb進行微調記錄，並可在對應網站上觀察到訓練過程如下：

2.3 DeepSeek R1模型下載

ModelScope模型地址：https://www.modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

建立DeepSeek-R1-Distill-Qwen-7B資料夾，用於儲存下載的模型權重：

mkdir ./DeepSeek-R1-Distill-Qwen-7B

即可使用如下命令進行模型下載：

modelscope download --model deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --local_dir ./DeepSeek-R1-Distill-Qwen-7B

也有其他的下載方法，視使用情況選擇。

下載後的模型的完整檔案如下：

2.4 微調資料

DeepSeek R1及其蒸餾模型，推理過程的具體體現就是在回覆內容中，會同時包含推理部分內容和最終回覆部分內容，並且其推理部分內容會透過（一種在模型訓練過程中注入的特殊標記）來進行區分。也就是說， DeepSeek R1模型組的回覆格式是一種非常特殊的格式，即包含think部分內容，也包含 response部分內容。

因此，在圍繞DeepSeek R1模型組進行微調的時候，微調資料集的回覆部分文字也需要是包含推理和最終回覆兩部分內容，才能使得DeepSeek R1模型組在保持既定回覆風格的同時，強化模型能力，反之則會導致指令消融問題（模型回覆不再包含think部分）。

此時think部分和最終回覆，共同構成有監督微調的標籤。

這種同時包含思考和結果的資料集，在當下並不少見，例如非常著名的數學問答資料集NuminaMath CoT，就同時包含數學問題、問題的解題思路（也就是think部分）和問題最終的答案。而該資料集也是可以用於推理模型微調的資料集。除了NuminaMath CoT資料集外，還有APPs（程式設計資料集）、 TACO（程式設計資料集）、long_form_thought_data_5k （通用問答資料集）等，都是CoT資料集，均可用於推理模型微調。

若實際業務有需要，也可以構造類似結構的資料集。

本文選取的資料集是一個包含推理過程的醫學資料集：由深圳大資料研究院釋出的HuatuoGPT-o1模型的微調資料集—medical-o1-reasoning-SFT，地址：https://www.modelscope.cn/datasets/AI-ModelScope/medical-o1-reasoning-SFT。

為什麼要選這個資料集：數學能力已經測試過了，想看看其在更難推理的場景下的能力

資料集總共包含25371條資料，均為醫學領域疾病診斷資料集，且不乏一些疑難雜症的推理和判斷，資料集整體質量較高，推理過程嚴謹準確，非常適合進行醫療領域模型微調，可以極大程度提高模型對於病理的推理過程，並在這個過程中完成一些醫療知識的灌注。

三、DeepSeek R1模型微調實操

3.1 unsloth LLama模型推理

需要藉助unsloth進行模型推理

匯入unsloth

from unsloth import FastLanguageModel

首先設定關鍵引數，並讀取模型：

關鍵引數

max_seq_length = 2048 //這決定了模型的上下文長度。例如，Gemini 的上下文長度超過 100 萬，而 Llama-3 的上下文長度為 8192。允許選擇任意數字 - 但出於測試目的，建議將其設定為 2048。Unsloth 還支援非常長的上下文微調，表明可以提供比最佳長 4 倍的上下文長度。

dtype = None //將其保持為 None，但可以為較新的 GPU 選擇 torch.float16 或 torch.bfloat16。

load_in_4bit = False //採用 4 位量化進行微調。這樣可將記憶體使用量減少 4 倍，從而在空閒的 16GB 記憶體 GPU 中實際進行微調。4 位量化本質上將權重轉換為一組有限的數字以減少記憶體使用量。這樣做的缺點是準確度會下降 1-2%。如果想要獲得如此微小的額外準確度，請在 H100 等較大的 GPU 上將其設定為 False

注，若視訊記憶體不足，則可以load_in_4bit = True，執行4 bit量化版。

模型載入

model, tokenizer = FastLanguageModel.from_pretrained(model_name = "./DeepSeek-R1-Distill-Qwen-7B",max_seq_length = max_seq_length,dtype = dtype,load_in_4bit = load_in_4bit,

現在要自定義微調，可以編輯上面的數字，也可以忽略它，因為已經預設選擇了相當合理的數字。目標是改變這些數字以提高準確率，同時也抵消過度擬合。過度擬合是指讓語言模型記住資料集，而無法回答新穎的新問題。希望最終模型能夠回答從未見過的問題，而不是進行記憶。

r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128

微調過程的等級。數值越大，佔用的記憶體越多，速度越慢，但可以提高複雜任務的準確性。我們通常建議數值為 8（用於快速微調），最高可達 128。數值過大可能會導致過度擬合，從而損害模型的質量。

target_modules = ["q_proj", "k_proj", "v_proj", "o_proj","gate_proj", "up_proj", "down_proj",],

選擇所有模組進行微調。可以刪除一些模組以減少記憶體使用量並加快訓練速度，但強烈不建議這樣做。只需在所有模組上進行訓練！

lora_alpha = 16,

微調的縮放因子。較大的數字將使微調更多地瞭解您的資料集，但可能會導致過度擬合。建議將其設定為等於等級r，或將其加倍。

lora_dropout = 0, # Supports any, but = 0 is optimized

將其保留為 0 以加快訓練速度！可以減少過度擬合，但效果不大。

bias = "none", # Supports any, but = "none" is optimized

將其保留為 0，以實現更快、更少的過度擬合訓練！

use_gradient_checkpointing = "unsloth", # True or"unsloth"for very long context

選項包括True、False 和"unsloth"。建議使用"unsloth"，因為記憶體使用量減少了 30%，並支援極長的上下文微調。https://unsloth.ai/blog/long-context瞭解更多詳細資訊。

random_state = 3407,

確定確定性執行的次數。訓練和微調需要隨機數，因此設定此數字可使實驗可重複。

use_rslora = False, # We support rank stabilized LoRA

高階功能可自動設定lora_alpha = 16。

loftq_config = None, # And LoftQ

高階功能可將 LoRA 矩陣初始化為權重的前 r 個奇異向量。可以在一定程度上提高準確度，但一開始會使記憶體使用量激增。

輸出

==((====))== Unsloth 2025.2.12: Fast Qwen2 patching. Transformers: 4.48.3. \\ /| GPU: Tesla V100S-PCIE-32GB. Max memory: 31.739 GB. Platform: Linux.O^O/ \_/ \ Torch: 2.6.0+cu124. CUDA: 7.0. CUDA Toolkit: 12.4. Triton: 3.2.0\ / Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]"-____-" Free Apache license: http://github.com/unslothai/unslothUnsloth: Fast downloading is enabled - ignore downloading bars which are red colored!Loading checkpoint shards: 100%|██████████| 2/2 [00:08<00:00, 4.23s/it]./DeepSeek-R1-Distill-Qwen-7B does not have a padding token! Will usepad_token = <|vision_pad|>.

在INT4量化情況下，8B模型推理僅需7G左右視訊記憶體。

此時model就是讀取進來的DeepSeek R1 蒸餾模型：

模型資訊

model

輸出

Qwen2ForCausalLM( (model): Qwen2Model( (embed_tokens): Embedding(152064, 3584, padding_idx=151654) (layers): ModuleList( (0-27): 28 x Qwen2DecoderLayer( (self_attn): Qwen2Attention( (q_proj): Linear(in_features=3584, out_features=3584, bias=True) (k_proj): Linear(in_features=3584, out_features=512, bias=True) (v_proj): Linear(in_features=3584, out_features=512, bias=True) (o_proj): Linear(in_features=3584, out_features=3584, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): Qwen2MLP( (gate_proj): Linear(in_features=3584, out_features=18944, bias=False) (up_proj): Linear(in_features=3584, out_features=18944, bias=False) (down_proj): Linear(in_features=18944, out_features=3584, bias=False) (act_fn): SiLU() ) (input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06) (post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06) ) ) (norm): Qwen2RMSNorm((3584,), eps=1e-06) (rotary_emb): LlamaRotaryEmbedding() ) (lm_head): Linear(in_features=3584, out_features=152064, bias=False))

tokenizer是分詞器：

分詞器

tokenizer

輸出

LlamaTokenizerFast(name_or_path='./DeepSeek-R1-Distill-Qwen-7B', vocab_size=151643, model_max_length=131072, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<｜begin▁of▁sentence｜>', 'eos_token': '<｜end▁of▁sentence｜>', 'pad_token': '<|vision_pad|>'}, clean_up_tokenization_spaces=False, added_tokens_decoder={

151643: AddedToken("<｜end▁of▁sentence｜>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),

151644: AddedToken("<｜User｜>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151645: AddedToken("<｜Assistant｜>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),

151646: AddedToken("<｜begin▁of▁sentence｜>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),

151647: AddedToken("<|EOT|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151648: AddedToken("<think>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151649: AddedToken("</think>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151650: AddedToken("<|quad_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151651: AddedToken("<|quad_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151652: AddedToken("<|vision_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151653: AddedToken("<|vision_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151654: AddedToken("<|vision_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151655: AddedToken("<|image_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151656: AddedToken("<|video_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151657: AddedToken("<tool_call>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151658: AddedToken("</tool_call>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151659: AddedToken("<|fim_prefix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151660: AddedToken("<|fim_middle|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151661: AddedToken("<|fim_suffix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151662: AddedToken("<|fim_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151663: AddedToken("<|repo_name|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151664: AddedToken("<|file_sep|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),})

將模型調整為推理模式：

推理模式

FastLanguageModel.for_inference(model)

輸出

然後即可和模型進行對話：

問題字串

question = "你是誰？"

然後這裡首先需要藉助分詞器，將輸入的問題轉化為標記索引：

轉化為標記索引

inputs = tokenizer([question], return_tensors="pt").to("cuda")

索引資訊

inputs

輸出

{'input_ids': tensor([[151646, 105043, 100165,  11319]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1]], device='cuda:0')}

最後再帶入inputs進行對話

回答問題

outputs = model.generate(input_ids=inputs.input_ids,max_new_tokens=1200,use_cache=True,)

此時得到的回覆也是詞索引：

回覆索引

outputs

tensor([[151646, 105043, 100165, ..., 102454, 5373, 99728]],device='cuda:0')

同樣需要分詞器將其轉化為文字：

轉化為文字

response = tokenizer.batch_decode(outputs)

回覆文字

response

輸出

['<｜begin▁of▁sentence｜>你是誰？我需要幫助你完成這個任務。8年前，你是一個剛進入職場的大學生，正在尋找工作。現在，你已經找到了一份工作，但可能需要進一步發展。你已經知道你的工作職責，但可能需要了解如何更好地完成任務。你已經知道如何處理日常事務，但可能需要學習如何更高效地完成任務。你已經知道如何與同事和客戶溝通，但可能需要學習如何更好地管理時間。你已經知道如何處理工作中的問題，但可能需要學習如何處理壓力。你已經知道如何制定工作計劃，但可能需要學習如何調整計劃以適應變化。你已經知道如何使用辦公軟體，但可能需要學習如何更好地利用這些工具。你已經知道如何處理緊急情況，但可能需要學習如何更好地預防和化解潛在風險。你已經知道如何進行溝通，但可能需要學習如何更有效地傳達資訊。你已經知道如何建立關係，但可能需要學習如何更好地維護和擴充套件這些關係。你已經知道如何完成任務，但可能需要學習如何更高效地完成任務。你已經知道如何處理壓力，但可能需要學習如何更好地管理壓力。你已經知道如何制定計劃，但可能需要學習如何調整計劃以適應變化。你已經知道如何使用工具，但可能需要學習如何更好地利用這些工具。你已經知道如何處理緊急情況，但可能需要學習如何更好地預防和化解潛在風險。你已經知道如何進行溝通，但可能需要學習如何更有效地傳達資訊。你已經知道如何建立關係，但可能需要學習如何更好地維護和擴充套件這些關係。\n\n好吧，現在我需要幫助使用者完成這個任務。首先，我需要理解使用者的需求。使用者希望從一個剛入職的大學生的角度，逐步提升自己的職場技能，包括效率、時間管理、溝通、關係維護、壓力管理、計劃調整、工具使用、緊急情況處理和資訊傳達等。\n\n接下來，我需要考慮使用者可能的身份。他可能是一個正在找工作或剛進入職場的大學生，對職場生活充滿好奇和挑戰。他可能對如何有效地完成工作感到困惑，或者想進一步提升自己的職業能力。\n\n使用者可能沒有明確說出的深層需求包括：他可能希望找到一個系統化的學習路徑，幫助他逐步成長；他可能需要具體的建議或資源，比如培訓課程、書籍、工具等；他可能希望瞭解如何平衡工作與生活，緩解壓力，提升幸福感。\n\n因此，我應該提供一個結構化的計劃，涵蓋效率提升、時間管理、溝通技巧、壓力管理、計劃調整、工具使用、緊急情況處理、資訊傳達和關係維護等方面。同時，建議他利用學習資源和實踐來持續成長。\n\n最後，我需要用鼓勵和支援的語氣，幫助他建立信心，相信自己能夠透過努力實現職業目標。\n</think>\n\n好的，我將按照您的要求，幫助您逐步提升職場技能。以下是一個結構化的學習計劃，涵蓋您提到的各個方面：\n\n### 1. 效率提升\n- **學習工具使用**：掌握常用辦公軟體（如Excel、Word、Teams等）的高階功能，如自動化工具（Power Automate）、模板、快捷鍵等。\n- **時間管理**：\n  - 使用時間管理方法（如番茄工作法、GTD）來提高工作效率。\n  - 學習日計劃表和周計劃表的製作與使用，確保任務按計劃完成。\n  - 避免多工處理，專注於一項任務直到完成。\n\n### 2. 時間管理\n- **設定優先順序**：學會使用ABC分析法，確定任務的優先順序。\n- **建立緩衝時間**：避免過度壓縮時間，留出緩衝時間以應對突發情況。\n- **定期覆盤**：每週覆盤一週的工作，分析效率提升的空間。\n\n### 3. 溝通技巧\n- **有效溝通**：學習如何清晰、簡潔地表達自己的觀點，傾聽他人的意見。\n- **非語言溝通**：觀察和學習領導和同事的非語言溝通方式，如眼神交流、肢體語言等。\n- **團隊協作**：參與團隊專案，鍛鍊合作能力，學會妥協與協作。\n\n### 4. 壓力管理\n- **壓力識別**：學會識別壓力源，並學會區分短期和長期的壓力。\n- **放鬆技巧**：學習深呼吸、冥想等放鬆技巧，以應對壓力。\n- **尋求支援**：建立支援網路，如朋友、家人或同事，分享壓力和困難。\n\n### 5. 制計劃與調整\n- **靈活計劃**：學習根據實際情況調整計劃的能力。\n- **定期評估計劃**：每週評估計劃的執行情況，及時調整。\n- **快速調整**：遇到變化時，快速調整計劃以適應新情況。\n\n### 6. 工具使用\n- **自動化工具**：學習使用Power Automate、 Zapier等工具自動化工作流程。\n- **學習新工具**：根據工作需求，學習和掌握新工具，如Jira、Trello等專案管理工具。\n- **工具比較**：比較不同工具的優缺點，選擇最適合自己的工具。\n\n### 7. 緊急情況處理\n- **應急預案**：制定並學習如何處理緊急情況的預案。\n- **快速反應**：練習如何在緊急情況下快速反應，解決問題。\n- **學習經驗**：總結處理緊急情況的經驗教訓，提升應對能力。\n\n### 8. 資訊傳達\n- **清晰表達**：學習如何用簡潔、有力的語言傳達關鍵資訊。\n- **視覺溝通**：利用圖表、PPT等視覺工具，更有效地傳達資訊。\n- **傾聽反饋**：學習如何從同事和客戶那裡獲取反饋，改進資訊傳達方式。\n\n### 9. 關係維護\n- **建立聯絡**：主動與同事、領導']

格式化輸出一下：

格式化

print(response[0])

輸出

<｜begin▁of▁sentence｜>你是誰？我需要幫助你完成這個任務。8年前，你是一個剛進入職場的大學生，正在尋找工作。現在，你已經找到了一份工作，但可能需要進一步發展。你已經知道你的工作職責，但可能需要了解如何更好地完成任務。你已經知道如何處理日常事務，但可能需要學習如何更高效地完成任務。你已經知道如何與同事和客戶溝通，但可能需要學習如何更好地管理時間。你已經知道如何處理工作中的問題，但可能需要學習如何處理壓力。你已經知道如何制定工作計劃，但可能需要學習如何調整計劃以適應變化。你已經知道如何使用辦公軟體，但可能需要學習如何更好地利用這些工具。你已經知道如何處理緊急情況，但可能需要學習如何更好地預防和化解潛在風險。你已經知道如何進行溝通，但可能需要學習如何更有效地傳達資訊。你已經知道如何建立關係，但可能需要學習如何更好地維護和擴充套件這些關係。你已經知道如何完成任務，但可能需要學習如何更高效地完成任務。你已經知道如何處理壓力，但可能需要學習如何更好地管理壓力。你已經知道如何制定計劃，但可能需要學習如何調整計劃以適應變化。你已經知道如何使用工具，但可能需要學習如何更好地利用這些工具。你已經知道如何處理緊急情況，但可能需要學習如何更好地預防和化解潛在風險。你已經知道如何進行溝通，但可能需要學習如何更有效地傳達資訊。你已經知道如何建立關係，但可能需要學習如何更好地維護和擴充套件這些關係。

好吧，現在我需要幫助使用者完成這個任務。首先，我需要理解使用者的需求。使用者希望從一個剛入職的大學生的角度，逐步提升自己的職場技能，包括效率、時間管理、溝通、關係維護、壓力管理、計劃調整、工具使用、緊急情況處理和資訊傳達等。

接下來，我需要考慮使用者可能的身份。他可能是一個正在找工作或剛進入職場的大學生，對職場生活充滿好奇和挑戰。他可能對如何有效地完成工作感到困惑，或者想進一步提升自己的職業能力。

使用者可能沒有明確說出的深層需求包括：他可能希望找到一個系統化的學習路徑，幫助他逐步成長；他可能需要具體的建議或資源，比如培訓課程、書籍、工具等；他可能希望瞭解如何平衡工作與生活，緩解壓力，提升幸福感。

因此，我應該提供一個結構化的計劃，涵蓋效率提升、時間管理、溝通技巧、壓力管理、計劃調整、工具使用、緊急情況處理、資訊傳達和關係維護等方面。同時，建議他利用學習資源和實踐來持續成長。

最後，我需要用鼓勵和支援的語氣，幫助他建立信心，相信自己能夠透過努力實現職業目標。

</think>

好的，我將按照您的要求，幫助您逐步提升職場技能。以下是一個結構化的學習計劃，涵蓋您提到的各個方面：

### 1. 效率提升

- **學習工具使用**：掌握常用辦公軟體（如Excel、Word、Teams等）的高階功能，如自動化工具（Power Automate）、模板、快捷鍵等。

- **時間管理**： - 使用時間管理方法（如番茄工作法、GTD）來提高工作效率。 - 學習日計劃表和周計劃表的製作與使用，確保任務按計劃完成。 - 避免多工處理，專注於一項任務直到完成。### 2. 時間管理- **設定優先順序**：學會使用ABC分析法，確定任務的優先順序。- **建立緩衝時間**：避免過度壓縮時間，留出緩衝時間以應對突發情況。- **定期覆盤**：每週覆盤一週的工作，分析效率提升的空間。### 3. 溝通技巧- **有效溝通**：學習如何清晰、簡潔地表達自己的觀點，傾聽他人的意見。- **非語言溝通**：觀察和學習領導和同事的非語言溝通方式，如眼神交流、肢體語言等。- **團隊協作**：參與團隊專案，鍛鍊合作能力，學會妥協與協作。### 4. 壓力管理- **壓力識別**：學會識別壓力源，並學會區分短期和長期的壓力。- **放鬆技巧**：學習深呼吸、冥想等放鬆技巧，以應對壓力。- **尋求支援**：建立支援網路，如朋友、家人或同事，分享壓力和困難。### 5. 制計劃與調整- **靈活計劃**：學習根據實際情況調整計劃的能力。- **定期評估計劃**：每週評估計劃的執行情況，及時調整。- **快速調整**：遇到變化時，快速調整計劃以適應新情況。### 6. 工具使用- **自動化工具**：學習使用Power Automate、 Zapier等工具自動化工作流程。- **學習新工具**：根據工作需求，學習和掌握新工具，如Jira、Trello等專案管理工具。- **工具比較**：比較不同工具的優缺點，選擇最適合自己的工具。### 7. 緊急情況處理- **應急預案**：制定並學習如何處理緊急情況的預案。- **快速反應**：練習如何在緊急情況下快速反應，解決問題。- **學習經驗**：總結處理緊急情況的經驗教訓，提升應對能力。### 8. 資訊傳達- **清晰表達**：學習如何用簡潔、有力的語言傳達關鍵資訊。- **視覺溝通**：利用圖表、PPT等視覺工具，更有效地傳達資訊。- **傾聽反饋**：學習如何從同事和客戶那裡獲取反饋，改進資訊傳達方式。### 9. 關係維護- **建立聯絡**：主動與同事、領導

帶入問答模板進行回答

結構化輸入方法

提示詞

prompt_style_chat = """請寫出一個恰當的回答來完成當前對話任務。### Instruction:你是一名助人為樂的助手。### Question:{}### Response:<think>{}"""

問題

question = "你好，好久不見！"

格式化輸入

[prompt_style_chat.format(question, "")]

輸出

['請寫出一個恰當的回答來完成當前對話任務。\n\n### Instruction:\n你是一名助人為樂的助手。\n\n### Question:\n你好，好久不見！\n\n### Response:\n<think>']

分詞

inputs = tokenizer([prompt_style_chat.format(question, "")], return_tensors="pt").to("cuda")

回覆索引

outputs = model.generate(input_ids=inputs.input_ids,max_new_tokens=1200,use_cache=True,)

回覆文字

response = tokenizer.batch_decode(outputs)

回覆

response

輸出

['<｜begin▁of▁sentence｜>請寫出一個恰當的回答來完成當前對話任務。\n\n### Instruction:\n你是一名助人為樂的助手。\n\n### Question:\n你好，好久不見！\n\n### Response:\n<think>\n嗯，使用者發來“你好，好久不見！”這句話，看起來像是一種友好的問候，帶有親切感。首先，我需要分析使用者的意圖，可能是想打招呼或者繼續之前的對話。我應該回應得友好且溫暖，同時保持專業性。\n\n考慮到使用者可能是想建立聯絡，我應該用一種既親切又正式的方式回應。比如，使用“你好！很高興見到你！今天過得怎麼樣？”這樣的回覆既表達了問候，又詢問了近況，有助於繼續對話。\n\n另外，我需要確保語言簡潔明瞭，避免使用複雜的詞彙，讓使用者感覺輕鬆愉快。同時，保持語氣友好，讓使用者感到被重視和歡迎。\n\n最後，檢查一下回復是否符合所有要求，比如是否恰當、是否符合角色設定，以及是否能夠有效傳達資訊。確認無誤後，就可以傳送這個回覆了。\n</think>\n\n你好！很高興見到你！今天過得怎麼樣？<｜end▁of▁sentence｜>']

格式化

print(response[0].split("### Response:")[1])

格式化回覆

<think>

嗯，使用者發來“你好，好久不見！”這句話，看起來像是一種友好的問候，帶有親切感。首先，我需要分析使用者的意圖，可能是想打招呼或者繼續之前的對話。我應該回應得友好且溫暖，同時保持專業性。

考慮到使用者可能是想建立聯絡，我應該用一種既親切又正式的方式回應。比如，使用“你好！很高興見到你！今天過得怎麼樣？”這樣的回覆既表達了問候，又詢問了近況，有助於繼續對話。

另外，我需要確保語言簡潔明瞭，避免使用複雜的詞彙，讓使用者感覺輕鬆愉快。同時，保持語氣友好，讓使用者感到被重視和歡迎。

最後，檢查一下回復是否符合所有要求，比如是否恰當、是否符合角色設定，以及是否能夠有效傳達資訊。確認無誤後，就可以傳送這個回覆了。

</think>你好！很高興見到你！今天過得怎麼樣？<｜end▁of▁sentence｜>

3.2 初始模型問答測試

3.2.1 問答模版設定

為測試模型微調的能力，選取了醫療相關的資料集進行微調。

先簡單測試下未經微調的模型能力，設定問答模板

提示詞

prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

Before answering, think carefully about the question andcreate a step-by-step chainof thoughts to ensure a logicaland accurate response.

翻譯如下：

提示詞翻譯

prompt_style = """以下是一個任務說明，配有提供更多背景資訊的輸入。請寫出一個恰當的回答來完成該任務。在回答之前，請仔細思考問題，並按步驟進行推理，確保回答邏輯清晰且準確。### Instruction:您是一位具有高階臨床推理、診斷和治療規劃知識的醫學專家。請回答以下醫學問題。問題：回覆："""

3.2.2 問答測試

接下來抽取部分medical-o1-reasoning-SFT資料集中問題進行提問，並檢視初始狀態下模型回答結果。

問題1

question_1 = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"

翻譯

question_1 ：一位61歲的女性，有長期在咳嗽或打噴嚏等活動中發生不自主尿液流失的病史，但夜間沒有漏尿。她接受了婦科檢查和Q-tip測試。根據這些檢查結果，膀胱測量（cystometry）最可能會顯示她的殘餘尿量和逼尿肌收縮情況如何？

問題2

question_2 = "Given a patient who experiences sudden-onset chest pain radiating to the neck and left arm, with a past medical history of hypercholesterolemia and coronary artery disease, elevated troponin I levels, and tachycardia, what is the most likely coronary artery involved based on this presentation?"

翻譯

question_2 ：面對一位突發胸痛並放射至頸部和左臂的患者，其既往病史包括高膽固醇血癥和冠狀動脈疾病，同時伴有升高的肌鈣蛋白I水平和心動過速，根據這些臨床表現，最可能受累的冠狀動脈是哪一條？

問答測試

問題1測試

inputs1 = tokenizer([prompt_style.format(question_1, "")], return_tensors="pt").to("cuda")outputs1 = model.generate( input_ids=inputs1.input_ids, max_new_tokens=1200, use_cache=True,)response1 = tokenizer.batch_decode(outputs1)print(response1[0].split("### Response:")[1])

回覆

<think>

Alright, I'm trying to figure out what the cystometry would show for this 61-year-old woman. She's having involuntary urine loss during activities like coughing or sneezing but not at night. So, she's experiencing urgency, right? That's when you don't hold your pee because something upsets you.

She went through a gynecological exam and a Q-tip test. I remember that a Q-tip test is used to check for a urine stream. If she's getting a stream, the Q-tip would be inserted and they'd have to pull it back out, which is a sign of an orinary reflex. So, the fact that she had a Q-tip test suggests that they noticed this involuntary loss.

Now, the question is about what a cystometry would show. Cystometry measures the residual volume in the bladder and also assesses the detrusor muscle contractions. The detrusor is the muscle that helps push urine out when you squeeze. If the detrusor isn't contracting properly, the bladder might not empty, leaving residual urine.

Since she's losing urine during activities that cause urgency, like coughing or sneezing, that's probably because the detrusor isn't contracting strong enough. When the detrusor doesn't contract, the bladder can't empty completely, so there's residual volume. That would mean during cystometry, they'd see a low residual volume because the bladder isn't holding much. Also, the detrusor contractions would be weak, indicating that the muscle isn't working as it should to push the urine out.

I think residual volume is low here because she's continuously losing urine, so not much isleftin the bladder. The detrusor contractions being weak would support that idea. So, putting it all together, the cystometry would show a low residual volume and weak detrusor contractions.

</think>

The61-year-old woman presents with involuntary urine loss during activities like coughing or sneezing, indicative of urgency. TheQ-tip test confirmed this by showing a positive orinary reflex. Upon cystometry, the findings would reveal:

1. **LowResidualVolume**: The bladder likely holds minimal urine due to the continuous loss during activities, suggesting incomplete emptying.

2. **WeakDetrusorContractions**: The detrusor muscle's contractions are inadequate, contributing to the inability to expel urine completely, thus maintaining residual urine in the bladder.

These observations point to a functional incontinence issue, likely due to insufficient detrusor activity.<｜end▁of▁sentence｜>

回覆翻譯

好的，我試著弄清楚這位61歲女性的膀胱壓力測定會顯示什麼。她在咳嗽或打噴嚏等活動中出現不自主的尿液流失，但晚上沒有。所以，她經歷了尿急，對吧？就是當你因為某些事情而無法控制排尿的時候。

她進行了婦科檢查和Q-tip測試。我記得Q-tip測試是用來檢查尿流的。如果有尿流，Q-tip就會被插入，然後需要拉出來，這表明存在排尿反射。所以，她接受Q-tip測試這一事實表明醫生注意到了這種不自主的流失。

現在，問題是膀胱壓力測定會顯示什麼。膀胱壓力測定測量膀胱內的殘餘尿量，並評估逼尿肌的收縮。逼尿肌是幫助排尿的肌肉。當逼尿肌收縮不當時，膀胱可能無法排空，留下殘餘尿液。

因為她在咳嗽或打噴嚏等引起尿急的活動中流失尿液，這可能是因為逼尿肌收縮不夠有力。當逼尿肌不收縮時，膀胱無法完全排空，因此會有殘餘尿量。這意味著在膀胱壓力測定中，他們會看到低殘餘尿量，因為膀胱中保持的尿液不多。此外，逼尿肌的收縮會很弱，表明該肌肉未能有效地排尿。

我認為這裡的殘餘尿量低是因為她持續流失尿液，所以膀胱裡剩下的不多。逼尿肌收縮弱也支援了這一觀點。因此，綜合來看，膀胱壓力測定會顯示低殘餘尿量和逼尿肌收縮弱。

這位61歲的女性在咳嗽或打噴嚏等活動中出現不自主的尿液流失，表明有尿急。Q-tip測試透過顯示陽性的排尿反射確認了這一點。在膀胱壓力測定中，結果將顯示：

低殘餘尿量：由於活動期間的持續流失，膀胱內可能保持的尿液極少，提示排空不完全。逼尿肌收縮弱：逼尿肌的收縮不足，導致無法完全排出尿液，從而使膀胱內保持殘餘尿液。這些觀察結果指向功能性尿失禁的問題，可能是由於逼尿肌活動不足所致。

問題2測試

inputs2 = tokenizer([prompt_style.format(question_2, "")], return_tensors="pt").to("cuda")outputs2 = model.generate( input_ids=inputs2.input_ids, max_new_tokens=1200, use_cache=True,)response2 = tokenizer.batch_decode(outputs2)print(response2[0].split("### Response:")[1])

回覆

<think>

Okay, so I'm trying to figure out which coronary artery is involved in this patient's chest pain. Let me start by breaking down the information given.

The patient has sudden-onset chest pain that goes to the neck and left arm. That makes me think about the possible locations of the coronary artery on the left side. Theleft coronary artery is typically on the left side of the neck, so pain radiating to the left arm could be coming from that area. ButI shouldn't jump to conclusions yet.

They have a past medical history of hypercholesterolemia and coronary artery disease. Hypercholesterolemiais a risk factor for heart disease, so it's something to consider, but not directly indicative of which artery is affected. The key here is the tachycardia, which is an elevated heart rate. Tachycardia can be a clue for several issues, including heart valve problems or arrhythmias, but in the context of chest pain, it's often linked to coronary artery issues.

Elevated troponin I levels are a big hint. TroponinIis a marker of heart muscle injury. Elevated levels suggest an acute myocardial infarction or a significant ischemia. Myocardial infarction typically affects one of the coronary arteries, causing tissue damage.

Now, considering the presentation: the pain is radiating to the left arm. Theleft arm is typically served by the left anterior descending (LAD) and left circumflex (LCx) coronary arteries. TheLAD supplies the left side of the chest, including the left arm, while the LCx supplies the upper arm and neck.

The tachycardia could be due to the heart muscle being ischemic, which can cause the heart to beat faster as it tries to pump blood. Thisis more common inLAD disease because the LADis responsible for the left side of the heart, which includes the left ventricle and part of the left atrium. If the LADis blocked, the left ventricle might not get enough blood, leading to a rapid heart rate.

On the other hand, LCx disease affects the right upper arm and the neck. If that's the case, the pain might radiate to the right side of the neck or the right arm, not the left. But the patient's pain is specifically on the left side, so that makes me lean towards LAD being the culprit.

I should also consider the possible causes of LAD disease. It's often due to atherosclerosis, which is more common in patients with a history of hypercholesterolemia. Since the patient has a history of coronary artery disease, this adds to the suspicion that it's LAD.

So putting it all together: the presentation points to the left arm, the elevated troponin suggests myocardial injury, and the tachycardia supports ischemia. All these factors together make the most likely coronary artery affected the left anterior descending (LAD).

</think>The most likely coronary artery affected in this patient is the **left anterior descending (LAD)**. **Step-by-StepExplanation:**

1. **Presentation of Pain:** The patient experiences chest pain radiating to the left arm. This suggests involvement of the left coronary artery, as the LAD supplies blood to the left side of the chest and the left arm.

2. **ElevatedTroponinILevels:** Elevated troponin I indicates myocardial injury, commonly associated with a myocardial infarction or significant coronary artery disease.

3. **Tachycardia:** Increased heart rate is often due to ischemia, which can occur when blood flow to the heart muscle is reduced, as happens with coronary artery disease.

4. **CoronaryArteryDisease and History:** The patient's past history of coronary artery disease and hypercholesterolemia (a risk factor for cardiovascular disease) supports the suspicion of ischemia in one of the coronary arteries.

5. **CoronaryArteryLocations:** TheLADis responsible for the left side of the heart, including the left ventricle and part of the left atrium. Blockage of this artery would cause ischemia and tachycardia, aligning with the patient's symptoms.

**Conclusion:** Based on the presentation, elevation of troponin I, and tachycardia, the most likely coronary artery affected is the left anterior descending (LAD).<｜end▁of▁sentence｜>

回覆翻譯

好的，我在嘗試弄清楚這位患者的胸痛涉及哪個冠狀動脈。讓我從給定的資訊入手，逐步分析。

患者出現了突然發作的胸痛，疼痛放射到頸部和左臂。這讓我考慮到可能涉及左側的冠狀動脈。左冠狀動脈通常位於左側頸部，因此放射到左臂的疼痛可能來自該區域。但我不應該急於下結論。

患者有高膽固醇血癥和冠狀動脈疾病的病史。高膽固醇血癥是心臟病的危險因素，因此需要考慮，但並不能直接指示受累的冠狀動脈。關鍵在於心動過速，即心率升高。心動過速可能是多種問題的線索，包括心臟瓣膜問題或心律失常，但在胸痛的背景下，通常與冠狀動脈問題有關。

升高的肌鈣蛋白I水平是一個重要線索。肌鈣蛋白I是心肌損傷的標誌物，升高的水平提示急性心肌梗死或顯著的缺血。心肌梗死通常影響某條冠狀動脈，導致組織損傷。

現在考慮表現：疼痛放射到左臂。左臂通常由左前降支（LAD）和左迴旋支（LCx）供血。LAD為胸部左側供血，包括左臂，而LCx則供給上臂和頸部。

心動過速可能是由於心肌缺血引起的，這可能導致心臟在試圖泵血時加速跳動。這在LAD疾病中更為常見，因為LAD負責左側心臟的供血，包括左心室和部分左心房。如果LAD阻塞，左心室可能得不到足夠血液，導致心率加快。

另一方面，LCx疾病則會影響右上臂和頸部。如果是這種情況，疼痛可能放射到右側頸部或右臂，而不是左側。但患者的疼痛明確在左側，因此我傾向於認為是LAD出現問題。

我還應該考慮LAD疾病的可能原因。這通常與動脈粥樣硬化有關，而患有高膽固醇血癥歷史的患者更容易出現這種情況。由於患者有冠狀動脈疾病的病史，這進一步增加了對LAD的懷疑。

所以綜合考慮：疼痛指向左臂，肌鈣蛋白升高提示心肌損傷，心動過速支援缺血。這些因素結合在一起，使得最可能受累的冠狀動脈是左前降支（LAD）。

最有可能受累的冠狀動脈是左前降支（LAD）。逐步解釋：

疼痛表現：患者感到胸痛放射到左臂。這提示可能涉及左冠狀動脈，因為LAD為胸部左側和左臂供血。

升高的肌鈣蛋白I水平：升高的肌鈣蛋白I指示心肌損傷，通常與心肌梗死或顯著的冠狀動脈疾病有關。

心動過速：心率升高通常是由於缺血，當血流減少時，會發生這種情況，就像冠狀動脈疾病中一樣。

冠狀動脈疾病及病史：患者的冠狀動脈疾病和高膽固醇血癥的病史（心血管疾病的危險因素）支援對其中一條冠狀動脈缺血的懷疑。

冠狀動脈位置：LAD負責左心室和部分左心房的供血。該動脈的阻塞將導致缺血和心動過速，與患者的症狀相符。

結論：根據疼痛表現、肌鈣蛋白I升高和心動過速，最有可能受累的冠狀動脈是左前降支（LAD）。

3.2.3 問答測試結果

標準答案

問題1標準答案

在這種壓力性尿失禁的情況下，膀胱測壓檢查（cystometry）最可能顯示正常的排尿後殘餘尿量，因為壓力性尿失禁通常不會影響膀胱排空功能。此外，由於壓力性尿失禁主要與身體用力有關，而不是膀胱過度活動症（OAB），因此在測試過程中不太可能觀察到逼尿肌的非自主收縮。

問題2標準答案

根據患者表現出的突然胸痛並放射至頸部和左臂，結合其有高膽固醇血癥和冠狀動脈疾病的病史，肌鈣蛋白升高和心動過速，臨床症狀強烈提示左前降支（LAD）動脈受累。該動脈通常是引發此類症狀的罪魁禍首，因為它供應了心臟的大部分割槽域。放射性疼痛和肌鈣蛋白升高的組合表明心肌受損，這使得LAD成為最可能的致病動脈。然而，在沒有進一步的診斷檢查（如心電圖）的情況下，最終的確診仍需等待確認。

能夠看出，在原始狀態下，模型能夠進行推理並給出回覆，但實際上第一個回答過程並不符合醫學規範，而第二個問題則直接回答錯誤。由此可見，在初始狀態下，模型對於medical-o1-reasoning-SFT資料集問答效果並不好。

接下來嘗試進行微調，並測試微調後模型問答效果。

3.3 最小可行性實驗

接下來嘗試進行模型微調，對於當前資料集而言，可以帶入原始資料集的部分資料進行微調，也可以帶入全部資料並遍歷多次進行微調。對於大多數的微調實驗，可以從最小可行性實驗入手進行微調，也就是先嚐試帶入少量資料進行微調，並觀測微調效果。若微調可以順利執行，並能夠獲得微調效果，再考慮帶入更多的資料進行更大規模微調。

3.3.1 資料集準備

這裡直接從modelscope上下載medical-o1-reasoning-SFT資料集。

如果有報錯，按提示解決，如果還不行可以取hugging face的資料集，或者換其他資料集、或者將資料集下載到本地再讀取。本人嘗試了好多，有些資料集能成功，有些不行，各種莫名奇妙的報錯，按報錯提示改動即可。

資料集下載

from modelscope.msdatasets import MsDatasetds = MsDataset.load('AI-ModelScope/medical-o1-reasoning-SFT')

檢視

ds[0]

輸出

{'Question': 'A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?',

'Complex_CoT': "Okay, let's think about this step by step. There's a 61-year-old woman here who's been dealing with involuntary urine leakages whenever she's doing something that ups her abdominal pressure like coughing or sneezing. This sounds a lot like stress urinary incontinence to me. Now, it's interesting that she doesn't have any issues at night; she isn't experiencing leakage while sleeping. This likely means her bladder's ability to hold urine is fine when she isn't under physical stress. Hmm, that's a clue that we're dealing with something related to pressure rather than a bladder muscle problem. \n\nThe fact that she underwent a Q-tip test is intriguing too. This test is usually done to assess urethral mobility. In stress incontinence, a Q-tip might move significantly, showing urethral hypermobility. This kind of movement often means there's a weakness in the support structures that should help keep the urethra closed during increases in abdominal pressure. So, that's aligning well with stress incontinence.\n\nNow, let's think about what would happen during cystometry. Since stress incontinence isn't usually about sudden bladder contractions, I wouldn't expect to see involuntary detrusor contractions during this test. Her bladder isn't spasming or anything; it's more about the support structure failing under stress. Plus, she likely empties her bladder completely because stress incontinence doesn't typically involve incomplete emptying. So, her residual volume should be pretty normal. \n\nAll in all, it seems like if they do a cystometry on her, it will likely show a normal residual volume and no involuntary contractions. Yup, I think that makes sense given her symptoms and the typical presentations of stress urinary incontinence.",

'Response': 'Cystometry in this case of stress urinary incontinence would most likely reveal a normal post-void residual volume, as stress incontinence typically does not involve issues with bladder emptying. Additionally, since stress urinary incontinence is primarily related to physical exertion and not an overactive bladder, you would not expect to see any involuntary detrusor contractions during the test.'}

在最小可行性實驗中，可以下載500條資料進行微調即可看出效果：

劃分資料集

from modelscope.msdatasets import MsDatasetds = MsDataset.load('AI-ModelScope/medical-o1-reasoning-SFT', split = "train[0:500]")

3.3.2 文字進行結構化處理

提示詞模版

train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.

Write a response that appropriately completes the request.

Before answering, think carefully about the question andcreate a step-by-step chainof thoughts to ensure a logicaland accurate response.

然後提取並設定文字生成結束的標記：

文字生成結束標記

EOS_TOKEN = tokenizer.eos_tokentokenizer.eos_token

輸出

'<｜end▁of▁sentence｜>'

然後定義函式，用於對medical-o1-reasoning-SFT資料集進行修改，Complex_CoT列和Response列進行拼接，並加上文字結束標記：

格式化函式

defformatting_prompts_func(examples): inputs = examples["Question"] cots = examples["Complex_CoT"] outputs = examples["Response"] texts = []for input, cot, output in zip(inputs, cots, outputs): text = train_prompt_style.format(input, cot, output) + EOS_TOKEN texts.append(text)return {"text": texts, }

將資料集整理為如下形式：

資料格式化

ds = ds.map(formatting_prompts_func, batched = True,)

產看格式化後的資料

ds["text"][0]

輸出

"Below is an instruction that describes a task, paired with an input that provides further context. \nWrite a response that appropriately completes the request. \nBefore answering, think carefully about the question andcreate a step-by-step chainof thoughts to ensure a logicaland accurate response.\n\n### Instruction:\nYou are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. \nPlease answer the following medical question. \n\n### Question:\nA 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?\n\n### Response:\n<think>\nOkay, let's think about this step by step. There's a 61-year-old woman here who's been dealing with involuntary urine leakages whenever she's doing something that ups her abdominal pressure like coughing or sneezing. This sounds a lot like stress urinary incontinence to me. Now, it's interesting that she doesn't have any issues at night; she isn't experiencing leakage while sleeping. This likely means her bladder's ability to hold urine is fine when she isn't under physical stress. Hmm, that's a clue that we're dealing with something related to pressure rather than a bladder muscle problem. \n\nThe fact that she underwent a Q-tip test is intriguing too. This test is usually done to assess urethral mobility. In stress incontinence, a Q-tip might move significantly, showing urethral hypermobility. This kind of movement often means there's a weakness in the support structures that should help keep the urethra closed during increases in abdominal pressure. So, that's aligning well with stress incontinence.\n\nNow, let's think about what would happen during cystometry. Since stress incontinence isn't usually about sudden bladder contractions, I wouldn't expect to see involuntary detrusor contractions during this test. Her bladder isn't spasming or anything; it's more about the support structure failing under stress. Plus, she likely empties her bladder completely because stress incontinence doesn't typically involve incomplete emptying. So, her residual volume should be pretty normal. \n\nAll in all, it seems like if they do a cystometry on her, it will likely show a normal residual volume and no involuntary contractions. Yup, I think that makes sense given her symptoms and the typical presentations of stress urinary incontinence.\n</think>\nCystometry in this case of stress urinary incontinence would most likely reveal a normal post-void residual volume, as stress incontinence typically does not involve issues with bladder emptying. Additionally, since stress urinary incontinence is primarily related to physical exertion and not an overactive bladder, you would not expect to see any involuntary detrusor contractions during the test.<｜end▁of▁sentence｜>"

3.3.3 開始微調

開啟微調，把模型設定為微調模式：

微調模式

model = FastLanguageModel.get_peft_model( model, r=16, target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj", ], lora_alpha=16, lora_dropout=0, bias="none", use_gradient_checkpointing="unsloth", # True or"unsloth"for very long context random_state=3407, use_rslora=False, loftq_config=None,)

輸出

Unsloth 2025.2.12patched 28 layerswith 28 QKVlayers, 28 Olayersand 28 MLPlayers.

然後匯入相關的庫：

導庫

from trl import SFTTrainerfrom transformers import TrainingArgumentsfrom unsloth import is_bfloat16_supported

建立微調物件：

微調物件

trainer = SFTTrainer(model=model,tokenizer=tokenizer,train_dataset=ds,dataset_text_field="text",max_seq_length=max_seq_length,dataset_num_proc=2,args=TrainingArguments(per_device_train_batch_size=2,gradient_accumulation_steps=4, # Use num_train_epochs = 1, warmup_ratio for full training runs!warmup_steps=5,max_step=60,learning_rate=2e-4,fp16=not is_bfloat16_supported(),bf16=is_bfloat16_supported(),logging_steps=10,optim="adamw_8bit",weight_decay=0.01,lr_scheduler_type="linear",seed=3407,output_dir="outputs",),)

通常不建議更改上述引數：

per_device_train_batch_size = 2,

如果想更多地利用 GPU 的記憶體，增加批處理大小。同時增加批處理大小可以使訓練更加流暢，並使過程不會過度擬合。通常不建議這樣做，因為這可能會因填充問題而使訓練速度變慢。通常會增加批處理大小，gradient_accumulation_steps這隻會對資料集進行更多遍歷。

gradient_accumulation_steps = 4,

相當於將批次大小增加到自身之上，但不會影響記憶體消耗，如果想要更平滑的訓練損失曲線，通常建議增加這個值。

max_steps = 60, # num_train_epochs = 1,

將步驟設定為 60 以加快訓練速度。對於可能需要數小時的完整訓練執行，請註釋掉max_steps，並將其替換為num_train_epochs = 1。將其設定為 1 表示對資料集進行 1 次完整傳遞。通常建議傳遞 1 到 3 次，不要更多，否則微調會過度擬合。

learning_rate = 2e-4,

如果想讓微調過程變慢，但同時又最有可能收斂到更高的準確度結果，降低學習率。通常建議嘗試 2e-4、1e-4、5e-5、2e-5 作為數字。

輸出

Applying chat template to train dataset (num_proc=2): 100%|██████████| 500/500 [00:01<00:00, 348.86 examples/s]

Tokenizing train dataset (num_proc=2): 100%|██████████| 500/500 [00:02<00:00, 221.20 examples/s]Tokenizing train dataset (num_proc=2): 100%|██████████| 500/500 [00:00<00:00, 920.96 examples/s]

Detected kernel version 4.19.91, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.

開始微調

trainer_stats = trainer.train()

這是訓練損失，如果目標是設定引數，使其儘可能接近 0.5，如果微調未達到 1、0.8 或 0.5，可能需要調整一些數字。如果損失為 0，這可能也不是一個好兆頭。

注意，unsloth在微調結束後，會自動更新模型權重（在快取中），因此無需手動合併模型權重即可直接呼叫微調後的模型：

FastLanguageModel.for_inference(model)

輸出

PeftModelForCausalLM( (base_model): LoraModel( (model): Qwen2ForCausalLM( (model): Qwen2Model( (embed_tokens): Embedding(152064, 3584, padding_idx=151654) (layers): ModuleList( (0-27): 28 x Qwen2DecoderLayer( (self_attn): Qwen2Attention( (q_proj): lora.Linear( (base_layer): Linear(in_features=3584, out_features=3584, bias=True) (lora_dropout): ModuleDict( (default): Identity() ) (lora_A): ModuleDict( (default): Linear(in_features=3584, out_features=16, bias=False) ) (lora_B): ModuleDict( (default): Linear(in_features=16, out_features=3584, bias=False) ) (lora_embedding_A): ParameterDict() (lora_embedding_B): ParameterDict() (lora_magnitude_vector): ModuleDict() ) (k_proj): lora.Linear( (base_layer): Linear(in_features=3584, out_features=512, bias=True) (lora_dropout): ModuleDict( (default): Identity() ) (lora_A): ModuleDict( (default): Linear(in_features=3584, out_features=16, bias=False) ) (lora_B): ModuleDict( (default): Linear(in_features=16, out_features=512, bias=False) ) (lora_embedding_A): ParameterDict() (lora_embedding_B): ParameterDict() (lora_magnitude_vector): ModuleDict() ) (v_proj): lora.Linear( (base_layer): Linear(in_features=3584, out_features=512, bias=True) (lora_dropout): ModuleDict( (default): Identity() ) (lora_A): ModuleDict( (default): Linear(in_features=3584, out_features=16, bias=False) ) (lora_B): ModuleDict( (default): Linear(in_features=16, out_features=512, bias=False) ) (lora_embedding_A): ParameterDict() (lora_embedding_B): ParameterDict() (lora_magnitude_vector): ModuleDict() ) (o_proj): lora.Linear( (base_layer): Linear(in_features=3584, out_features=3584, bias=False) (lora_dropout): ModuleDict( (default): Identity() ) (lora_A): ModuleDict( (default): Linear(in_features=3584, out_features=16, bias=False) ) (lora_B): ModuleDict( (default): Linear(in_features=16, out_features=3584, bias=False) ) (lora_embedding_A): ParameterDict() (lora_embedding_B): ParameterDict() (lora_magnitude_vector): ModuleDict() ) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): Qwen2MLP( (gate_proj): lora.Linear( (base_layer): Linear(in_features=3584, out_features=18944, bias=False) (lora_dropout): ModuleDict( (default): Identity() ) (lora_A): ModuleDict( (default): Linear(in_features=3584, out_features=16, bias=False) ) (lora_B): ModuleDict( (default): Linear(in_features=16, out_features=18944, bias=False) ) (lora_embedding_A): ParameterDict() (lora_embedding_B): ParameterDict() (lora_magnitude_vector): ModuleDict() ) (up_proj): lora.Linear( (base_layer): Linear(in_features=3584, out_features=18944, bias=False) (lora_dropout): ModuleDict( (default): Identity() ) (lora_A): ModuleDict( (default): Linear(in_features=3584, out_features=16, bias=False) ) (lora_B): ModuleDict( (default): Linear(in_features=16, out_features=18944, bias=False) ) (lora_embedding_A): ParameterDict() (lora_embedding_B): ParameterDict() (lora_magnitude_vector): ModuleDict() ) (down_proj): lora.Linear( (base_layer): Linear(in_features=18944, out_features=3584, bias=False) (lora_dropout): ModuleDict( (default): Identity() ) (lora_A): ModuleDict( (default): Linear(in_features=18944, out_features=16, bias=False) ) (lora_B): ModuleDict( (default): Linear(in_features=16, out_features=3584, bias=False) ) (lora_embedding_A): ParameterDict() (lora_embedding_B): ParameterDict() (lora_magnitude_vector): ModuleDict() ) (act_fn): SiLU() ) (input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06) (post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06) ) ) (norm): Qwen2RMSNorm((3584,), eps=1e-06) (rotary_emb): LlamaRotaryEmbedding() ) (lm_head): Linear(in_features=3584, out_features=152064, bias=False) ) ))

驗證之前的問題：

問題驗證

inputs = tokenizer([prompt_style.format(question_1, "")], return_tensors="pt").to("cuda")outputs = model.generate( input_ids=inputs.input_ids, attention_mask=inputs.attention_mask, max_new_tokens=1200, use_cache=True,)response = tokenizer.batch_decode(outputs)

<think>

Alright, so we have a 61-year-old woman here, and she's been dealing with this weird issue where she loses urine during activities like coughing or sneezing. That's interesting. Now, she doesn't have leakage at night, which is a good sign, but it's definitely happening during these activities. Hmm, I'm thinking this might be some kind of urinary incontinence, but not the usual type. Let's think about what could be causing this.

Okay, during coughing or sneezing, the bladder gets pressed. This pressure can push urine out, even if the leakage at night isn't happening. So, this sounds like it might be a case of interstitial incontinence, where the bladder empties during certain activities, but doesn't necessarily have a leakage at night.

Now, let's consider the gynecological exam and the Q-tip test. These are typically used to check for urethral strictures. A urethral stenosis, or a narrowing, can certainly cause urinary incontinence, especially during activities that press the bladder. TheQ-tip test is a useful tool here to determine if there's a urethral obstruction.

If there's a urethral stenosis, it would definitely make sense that the Q-tip test might show a positive result, indicating some kind of obstruction. This would explain why she's losing urine during activities but not at night.

Now, let's think about what cystometry would reveal. Cystometryis a technique used to measure the residual volume in the bladder, which is the amount of urine left after the bladder empties. If there's a urethral stenosis, you'd expect to see a significant residual volume because the bladder doesn't empty completely, even though it's pressed during activities.

And what about detrusor contractions? In this scenario, with urethral stenosis, the detrusor muscles, which are responsible for bladder contraction, might not be fully contracting. This would lead to the bladder not emptying completely during the activities that press it, hence the residual volume.

So, putting this all together, if cystometry is done, it's likely to show a significant residual volume in the bladder and possibly reduced detrusor contractions due to the urethral obstruction. This would align with the symptoms she's experiencing during coughing or sneezing.

</think>

Based on the symptoms and the findings from the gynecological exam and Q-tip test, cystometry would likely reveal a significant residual volume in the bladder and reduced detrusor contractions. The presence of a urethral stenosis, inferred from these findings, would explain the urinary incontinence during activities like coughing or sneezing, while the absence of leakage at night suggests that the bladder doesn't empty completely due to the obstruction. The significant residual volume and reduced detrusor contractions are consistent with this type of incontinence.<｜end▁of▁sentence｜>

能夠發現，問題回答更加規範，但仍可能存在一定的回答錯誤。由此可以考慮繼續進行大規模微調。

3.3.4 模型合併

此時本地儲存的模型權重在outputs資料夾中：

然後可使用如下程式碼進行模型權重合並：

權重合並

new_model_local = "DeepSeek-R1-Medical"model.save_pretrained(new_model_local) tokenizer.save_pretrained(new_model_local)model.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit",)

3.4 微調引數解析

SFTTrainer 進行監督微調（Supervised Fine-Tuning, SFT），適用於 transformers 和 Unsloth 生態中的模型微調：1. 相關庫

●SFTTrainer（來自 trl 庫）：

○trl（Transformer Reinforcement Learning）是 Hugging Face 旗下的 trl 庫，提供監督微調（SFT）和強化學習（RLHF）相關的功能。

○SFTTrainer 主要用於有監督微調（Supervised Fine-Tuning），適用於 LoRA 等低秩適配微調方式。

●TrainingArguments（來自 transformers 庫）：

○這個類用於定義訓練超引數，比如批次大小、學習率、最佳化器、訓練步數等。

is_bfloat16_supported（來自 unsloth）：

○這個函式檢查當前 GPU 是否支援 bfloat16（BF16），如果支援，則返回 True，否則返回 False

○bfloat16 是一種更高效的數值格式，在新款 NVIDIA A100/H100 等GPU上表現更優。

2. 初始化 SFTTrainer

SFTTrainer 部分

TrainingArguments 部分

3.5 完整高效微調實驗

接下來嘗試帶入全部資料進行高效微調，以提升模型微調效果。

完整程式碼：

fromunsloth import FastLanguageModelfrommodelscope.msdatasets import MsDatasetfromtrl import SFTTrainerfromtransformers import TrainingArgumentsfromunsloth import is_bfloat16_supported

train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.

Writea response that appropriately completes the request.

Beforeanswering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:Youare a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. Pleaseanswer the following medical question. ### Question:{}### Response:<think>{}</think>{}"""max_seq_length = 2048 dtype = None load_in_4bit = Falsemodel,tokenizer = FastLanguageModel.from_pretrained(model_name = "./DeepSeek-R1-Distill-Qwen-7B",max_seq_length = max_seq_length,dtype = dtype,load_in_4bit = load_in_4bit,)EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKENdefformatting_prompts_func(examples):inputs = examples["Question"]cots = examples["Complex_CoT"]outputs = examples["Response"]texts = []forinput, cot, output in zip(inputs, cots, outputs):text = train_prompt_style.format(input, cot, output) + EOS_TOKENtexts.append(text)return{"text": texts,}ds = MsDataset.load('AI-ModelScope/medical-o1-reasoning-SFT', split = "train")dataset = ds.map(formatting_prompts_func, batched = True,)dataset["text"][0]model = FastLanguageModel.get_peft_model(model,r=16, target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj",],lora_alpha=16,lora_dropout=0, bias="none", use_gradient_checkpointing="unsloth", # True or "unsloth" for very long contextrandom_state=3407,use_rslora=False, loftq_config=None,)trainer = SFTTrainer(model=model,tokenizer=tokenizer,train_dataset=dataset,dataset_text_field="text",max_seq_length=max_seq_length,dataset_num_proc=2,args=TrainingArguments(per_device_train_batch_size=2,gradient_accumulation_steps=4,num_train_epochs = 3,warmup_steps=5, # max_steps=60,learning_rate=2e-4,fp16=not is_bfloat16_supported(),bf16=is_bfloat16_supported(),logging_steps=10,optim="adamw_8bit",weight_decay=0.01,lr_scheduler_type="linear",seed=3407,output_dir="outputs",),)wandb.init()trainer_stats = trainer.train()new_model_local = "DeepSeek-R1-Medical"model.save_pretrained(new_model_local)tokenizer.save_pretrained(new_model_local)model.save_pretrained_merged(new_model_local,tokenizer, save_method = "merged_16bit",)

完全跑下來時間可能比較久：