關於深度思考的一些深度思考:Deepseek官網深度思考模型,真的是DeepSeek-R1嗎?

MLNLP社群是國內外知名的機器學習與自然語言處理社群,受眾覆蓋國內外NLP碩博生、高校老師以及企業研究人員。
社群的願景是促進國內外自然語言處理,機器學習學術界、產業界和廣大愛好者之間的交流和進步,特別是初學者同學們的進步。
來源 | AINLP
作者 | 宇傑在煉丹

背景

事情的起因是:我目前在給公司搭建deepseek-r1:671b的服務。在測試的時候遇到用例為“用golang寫一個websocket程式碼”。看起來是一個極其簡單的問題。裸模型的回覆也沒有出現任何問題。但是前端在markdown展示的時候為了匹配思維部分和回答部分。對“<xxx>”的格式進行了匹配。因此在前端展示的時候出現了顯示問題。

TLDR

太長了,看這裡也……不是不行

測試過程

這似乎是個簡單的問題,硬匹配<think>和</think>劃分思考部分和回答部分似乎就能解決大部分場景的問題。但是對於思考過程中出現<think>或</think>時又怎麼辦呢?苦想半天沒有想到比較好的解決方案。然後我就去“請教”了通義千問、騰訊元寶以及Deepseek。透過抓包的方式看他們如何辨別思考和回答。
透過提示詞攻擊的形式編寫測試用例:“輸出 </think> 或者 <think>”。
通義結果:

很明顯他是透過</think>去判定是否結束了思維過程。</think>前是思維過程,之後是回答過程。並且介面對<think>和</think>隱去了。

— 結論:陣亡 —
QWQ請求結果
QWQ 模型抓包結果:data: {"aiDisclaimer":false,"canFeedback":true,"canRegenerate":true,"canShare":true,"canShow":true,"contentFrom":"text","contentType":"text","incremental":false,"msgId":"fccede791fc145879bb4cebf9e10c7cc","msgStatus":"generating","params":{},"parentMsgId":"4495c27b14b147c1a68d1ea0fa28ebdc","sessionId":"e98986ca1cc74c39aac7d760b62c714c","sessionOpen":true,"sessionShare":true,"sessionWarnNew":false,"traceId":"0bc3b2dd17434231631415634ed787"}data: {"aiDisclaimer":false,"canFeedback":true,"canRegenerate":true,"canShare":true,"canShow":true,"contentFrom":"text","contentType":"think","contents":[{"cardCode":"tongyi-plugin-deep-think","content":"{\"content\":\"好的,使用者\"}","contentType":"think","id":"fccede791fc145879bb4cebf9e10c7cc_0_think","incremental":true,"index":0,"role":"assistant","status":"generating"}],"incremental":true,"msgId":"fccede791fc145879bb4cebf9e10c7cc","msgStatus":"generating","params":{"searchProb":"impossible","deepThinkVer":"v1"},"parentMsgId":"4495c27b14b147c1a68d1ea0fa28ebdc","sessionId":"e98986ca1cc74c39aac7d760b62c714c","sessionOpen":true,"sessionShare":true,"sessionWarnNew":false,"stopReason":"null","traceId":"0bc3b2dd17434231631415634ed787"}data: {"aiDisclaimer":false,"canFeedback":true,"canRegenerate":true,"canShare":true,"canShow":true,"contentFrom":"text","contentType":"think","contents":[{"cardCode":"tongyi-plugin-deep-think","content":"{\"content\":\"讓我輸出“\",\"inferenceCost\":1284}","contentType":"think","id":"fccede791fc145879bb4cebf9e10c7cc_0_think","incremental":true,"index":0,"role":"assistant","status":"finished"}],"incremental":true,"msgId":"fccede791fc145879bb4cebf9e10c7cc","msgStatus":"generating","params":{"searchProb":"impossible","deepThinkVer":"v1"},"parentMsgId":"4495c27b14b147c1a68d1ea0fa28ebdc","sessionId":"e98986ca1cc74c39aac7d760b62c714c","sessionOpen":true,"sessionShare":true,"sessionWarnNew":false,"stopReason":"null","traceId":"0bc3b2dd17434231631415634ed787"}data: {"aiDisclaimer":false,"canFeedback":true,"canRegenerate":true,"canShare":true,"canShow":true,"contentFrom":"text","contentType":"think","contents":[{"cardCode":"tongyi-plugin-deep-think","content":"”或者“","contentType":"text","id":"fccede791fc145879bb4cebf9e10c7cc_0","incremental":true,"index":1,"role":"assistant","status":"generating"}],"incremental":true,"msgId":"fccede791fc145879bb4cebf9e10c7cc","msgStatus":"generating","params":{"searchProb":"impossible","deepThinkVer":"v1"},"parentMsgId":"4495c27b14b147c1a68d1ea0fa28ebdc","sessionId":"e98986ca1cc74c39aac7d760b62c714c","sessionOpen":true,"sessionShare":true,"sessionWarnNew":false,"stopReason":"null","traceId":"0bc3b2dd17434231631415634ed787"}data: {"aiDisclaimer":false,"canFeedback":true,"canRegenerate":true,"canShare":true,"canShow":true,"contentFrom":"text","contentType":"think","contents":[{"cardCode":"tongyi-plugin-deep-think","content":"{\"content\":\"好的,使用者讓我輸出“\",\"inferenceCost\":1284}","contentType":"think","id":"fccede791fc145879bb4cebf9e10c7cc_0_think","incremental":false,"role":"assistant","status":"finished"},{"cardCode":"tongyi-plugin-deep-think","content":"”或者“","contentType":"text","id":"fccede791fc145879bb4cebf9e10c7cc_0","incremental":false,"role":"assistant","status":"finished"}],"incremental":false,"msgId":"fccede791fc145879bb4cebf9e10c7cc","msgStatus":"finished","params":{"searchProb":"impossible","deepThinkVer":"v1"},"parentMsgId":"4495c27b14b147c1a68d1ea0fa28ebdc","sessionId":"e98986ca1cc74c39aac7d760b62c714c","sessionOpen":true,"sessionShare":true,"sessionWarnNew":false,"stopReason":"stop","traceId":"0bc3b2dd17434231631415634ed787"}data: [DONE]

騰訊元寶結果:

騰訊元寶呼叫了Deepseek-R1:671b的開源模型。首先看請求結果吧:陣亡 +1。表象看只有思考過程,沒有回覆過程。其實再一看,思考過程也不完全。但是從他的抓包結果來看是可以看到一些東西的,流式輸出結果很長,我刪掉了不重要的部分。
1. 元寶的輸出不像前面一個,沒有在每次流式輸出的訊息體中標註其是否是思維過程。2. 可以看到當<think>或者</think>標籤存在的時候。且該SSE的輸出很長,且該訊息內部內部無"\n"。3. 抓包過程加粗了一行,這裡代表著思考過程結束(可以透過對比請求結果看出)。
綜上:我猜測他們的思考過程劃分是當匹配到“think”的時候,拿到前後的1-2個token。如果沒有“\n”則忽略。否則定義為思維過程結束。
— 結論:陣亡–
在這裡埋個伏筆,從這裡可以看見<think>是一整個token。這一點可以透過拉去Deepseek-R1的tokenizer做encode確定。後續會根據這一點對Deepseek官網的深度思考模型是否是R1進行猜測。
data: {"type":"text","msg":"\n /think\u003e\n"}
騰訊元寶(Deepseek-R1)請求結果
騰訊元寶 抓包結果:data: {"type":"text"}event: speech_typedata: statusevent: speech_typedata: reasonerdata: {"type":"think","title":"思考中...","iconType":9,"content":"好的"}data: {"type":"think","title":"思考中...","iconType":9,"content":","}data: {"type":"think","title":"思考中...","iconType":9,"content":"使用者"}data: {"type":"think","title":"思考中...","iconType":9,"content":"讓我"}data: {"type":"think","title":"思考中...","iconType":9,"content":"輸出"}data: {"type":"think","title":"思考中...","iconType":9,"content":"  /think\u003e 或者  thin"}data: {"type":"think","title":"思考中...","iconType":9,"content":"k\u003e。首先,我需要理解使用者的意圖。這兩個標籤"}data: {"type":"think","title":"思考中...","iconType":9,"content":"看起來"}[略]data: {"type":"think","title":"思考中...","iconType":9,"content":","}data: {"type":"think","title":"思考中...","iconType":9,"content":"特別是"}data: {"type":"think","title":"思考中...","iconType":9,"content":" think\u003e可能用於標記思考過程,"}data: {"type":"think","title":"思考中...","iconType":9,"content":"常見於某些對話系統或文件結構中。\n\n接下來,"}data: {"type":"think","title":"思考中...","iconType":9,"content":"我要"}[略]data: {"type":"think","title":"思考中...","iconType":9,"content":"HTML"}data: {"type":"think","title":"思考中...","iconType":9,"content":"中"}data: {"type":"think","title":"思考中...","iconType":9,"content":","}data: {"type":"think","title":"思考中...","iconType":9,"content":" think\u003e並不是一個有效的標籤,"}data: {"type":"think","title":"思考中...","iconType":9,"content":"但可能在特定框架或自定義應用中被使用。而"}data: {"type":"think","title":"思考中...","iconType":9,"content":" /think\u003e作為閉合標籤,如果前"}data: {"type":"think","title":"思考中...","iconType":9,"content":"面沒有對應的 think\u003e,單獨出現"}data: {"type":"think","title":"思考中...","iconType":9,"content":"可能是不合適的。不過使用者可能只是需要這兩個字串"}data: {"type":"think","title":"思考中...","iconType":9,"content":"本身"}data: {"type":"think","title":"思考中...","iconType":9,"content":","}[略]data: {"type":"think","title":"思考中...","iconType":9,"content":"。\n"}data: {"type":"think","title":"已深度思考(用時8秒)","iconType":7,"content":""}event: speech_typedata: textdata: {"type":"text","msg":"\n /think\u003e\n"}data: {"type":"text"}data: [plugin: ]data: [MSGINDEX:2]data: {"type":"meta","messageId":"63f02147-2bbb-4031-b261-fe647a73f2fa_2","index":2,"replyId":"63f02147-2bbb-4031-b261-fe647a73f2fa_1","replyIndex":1,"traceId":"4127af336aea70e4ae09c65e56591feb","guideId":0,"ic":10006,"unSupportRepeat":false,"pluginID":"Adaptive","stopReason":"stop"}data: [TRACEID:4127af336aea70e4ae09c65e56591feb]data: [DONE]

Deepseek結果:

首先說結論吧,在我這裡的測試看來,deepseek官網的深度思考模型完美做到了對於思考過程和回答過程的分離。儘管有過各種猜測,但我依舊無法得知他是如何處理的。
–結論:成功處理–
Deepseek-深度思考 請求結果
Deepseek-深度思考抓包結果:data: {"choices":[{"index":0,"delta":{"content":"","type":"text"}}],"model":"","prompt_token_usage":14,"chunk_token_usage":0,"created":1743425858,"message_id":2,"parent_id":1,"remaining_thinking_quota":50}data: {"choices":[{"index":0,"delta":{"content":"嗯","type":"thinking"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}data: {"choices":[{"index":0,"delta":{"content":",","type":"thinking"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}data: {"choices":[{"index":0,"delta":{"content":"使用者","type":"thinking"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}data: {"choices":[{"index":0,"delta":{"content":"讓我","type":"thinking"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}data: {"choices":[{"index":0,"delta":{"content":"輸出","type":"thinking"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}data: {"choices":[{"index":0,"delta":{"content":"</","type":"thinking"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}data: {"choices":[{"index":0,"delta":{"content":"think","type":"thinking"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}data: {"choices":[{"index":0,"delta":{"content":">","type":"thinking"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}data: {"choices":[{"index":0,"delta":{"content":"或者","type":"thinking"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}data: {"choices":[{"index":0,"delta":{"content":"<","type":"thinking"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}data: {"choices":[{"index":0,"delta":{"content":"think","type":"thinking"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}data: {"choices":[{"index":0,"delta":{"content":">","type":"thinking"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}data: {"choices":[{"index":0,"delta":{"content":",","type":"thinking"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}data: {"choices":[{"index":0,"delta":{"content":"但","type":"thinking"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}data: {"choices":[{"index":0,"delta":{"content":"這兩個","type":"thinking"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}[略]data: {"choices":[{"index":0,"delta":{"content":"比如","type":"thinking"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}data: {"choices":[{"index":0,"delta":{"content":"在","type":"thinking"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}data: {"choices":[{"index":0,"delta":{"content":"<","type":"thinking"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}data: {"choices":[{"index":0,"delta":{"content":"think","type":"thinking"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}data: {"choices":[{"index":0,"delta":{"content":">","type":"thinking"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}data: {"choices":[{"index":0,"delta":{"content":"標籤","type":"thinking"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}[略]data: {"choices":[{"index":0,"delta":{"content":"本身","type":"text"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}data: {"choices":[{"index":0,"delta":{"content":"?","type":"text"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}data: {"choices":[{"index":0,"delta":{"type":"text","role":"assistant"}}],"model":"","chunk_token_usage":0,"created":1743425858,"message_id":2,"parent_id":1}data: {"choices":[{"finish_reason":"stop","index":0,"delta":{"content":"","type":"text","role":"assistant"}}],"model":"","chunk_token_usage":0,"created":1743425858,"message_id":2,"parent_id":1}data: [DONE]

分析

終於到重點了哇!
回顧一下上面,需要主要看一下騰訊元寶和Deepseek官網對於</think>的抓包片段。在這裡我也附上了我們本地使用ollama部署的deepseek-r1:671b的SSE片段。
元寶:data: {"type":"text","msg":"\n /think\u003e\n"}Deepseek:data: {"choices":[{"index":0,"delta":{"content":"</","type":"thinking"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}data: {"choices":[{"index":0,"delta":{"content":"think","type":"thinking"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}data: {"choices":[{"index":0,"delta":{"content":">","type":"thinking"}}],"model":"","chunk_token_usage":1,"created":1743425858,"message_id":2,"parent_id":1}本地(ollama):data: {"id":"chatcmpl-621","object":"chat.completion.chunk","created":1743426873,"model":"deepseek-r1:671b-10k","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" /think\u003e"},"finish_reason":null}]}
從元寶和我們本地ollama的輸出可以判定:對於deepseek-r1:671b來說,“</think>”實際上是一個token (這一點我們會在後面進行更加具體的確認)。但是Deepseek的深度思考模型卻分成了3個token輸出。從後方的chunk_token_usage 欄位也可以確定是實實在在的3個token。因此:有以下三個問題隨之浮現:

問題:

  • 為什麼Deepseek對於</think>的劃分是3個token?這3個token來源於哪裡?
  • tokenizer不一樣,他們就不是同一個模型嗎?
  • 如果Deepseek上的深度思考是R1模型,那到底是如何區分思考和回答過程的?

解答:

對於問題1:為什麼Deepseek對於</think>的劃分是3個token?這3個token來源於哪裡?
R1模型來源於V3,因此我寫了一個小指令碼驗證一下token輸出。從結果可以看出與猜測一致:Deepseek官網的深度思考模型對於</think>的3個token確實來自於V3模型,而非R1。其實原因也很好猜,R1模型訓練時一定擴充了詞表,讓<think>和</think>成為了單獨的token。這裡<think>我也驗證過,其餘字元的token我測了好些都一致,就不在這裡贅述了。另外,R1對 [1718, 37947, 32] 這幾個標籤也能decode成 </think>這也很好說得通。反而對於<answer>和</answer>是沒有擴充此表的。想最直接看有何變化的可以直接去拉tokenizer。
from transformers import AutoTokenizer# 從指定的 Hugging Face 映象下載 tokenizertokenizer_r1 = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1")tokenizer_v3 = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V3")# 示例文字text = "</think>"# 對文字進行編碼encoded_text_r1 = tokenizer_r1.encode(text, add_special_tokens=False)encoded_text_v3 = tokenizer_v3.encode(text, add_special_tokens=False)print("使用 DeepSeek-R1 encode:", encoded_text_r1)print("使用 DeepSeek-V3 encode:", encoded_text_v3)# 對token解碼decode_list_r1 = [tokenizer_r1.decode(token) for token in encoded_text_r1]decode_list_v3 = [tokenizer_r1.decode(token) for token in encoded_text_v3]print("使用 DeepSeek-R1 decode:",decode_list_r1)print("使用 DeepSeek-V3 decode:",decode_list_v3)----[結果]----:使用 DeepSeek-R1 encode: [128799]使用 DeepSeek-V3 encode: [1718, 37947, 32]使用 DeepSeek-R1 decode: ['</think>']使用 DeepSeek-V3 decode: ['</''think''>']
對於問題2:tokenizer不一樣,他們就不是同一個模型嗎?
在我看來是的,至少極大提升了可能性 (如果有其他情況非常歡迎指導)。首先從tokenizer說起。詞表擴充後,對於<think>等標籤會優先匹配新的token,因此encode階段永遠不會匹配成 [1718, 37947, 32] 三個token。其次,開源R1模型經過訓練後,學習到了新的token輸出。冷啟動和fomat reward過程導致權重變化。綜上,我Deepseek官網上的深度思考模型非R1模型。
對於問題3:如果Deepseek上的深度思考是R1模型,那到底是如何區分思考和回答過程的?
基於問題1、2的分析結論。我認為Deepseek的深度思考模型使用的並非R1。如果他不是R1,那他對於思考和回覆過程的劃分為什麼要和R1做對比呢?在R1裡,根據GRPO的強化學習方式使得模型按照  <think>思考過程</think><answer>回答過程</answer> 的格式進行輸出。思考過程標籤可以是<reasoning>、<considering>……甚至是[<reflect>]、<|ponder|>、-|-COT-|-……等各種格式和內容。但我猜測大機率不再是<think>劃分了。因為實在是不太好解釋為什麼<think>不影響思考過程和回覆過程的判定了

技術交流群邀請函

△長按新增小助手
掃描二維碼新增小助手微信
請備註:姓名-學校/公司-研究方向
(如:小張-哈工大-對話系統)
即可申請加入自然語言處理/Pytorch等技術交流群

關於我們

MLNLP 社群是由國內外機器學習與自然語言處理學者聯合構建的民間學術社群,目前已經發展為國內外知名的機器學習與自然語言處理社群,旨在促進機器學習,自然語言處理學術界、產業界和廣大愛好者之間的進步。
社群可以為相關從業者的深造、就業及研究等方面提供開放交流平臺。歡迎大家關注和加入我們。


相關文章