
社群是國內外知名的機器學習與自然語言處理社群,受眾覆蓋國內外NLP碩博生、高校老師以及企業研究人員。
轉載自 | 知乎
作者 | Glan格藍
疊下甲:“中文”欄位取最常見的中文叫法,若沒有就是硬翻;“相關”欄位為其相關的同類,不一定全;“出處”取網際網路搜到的,不一定準;“胡侃”欄位權當看個樂子,一家之言;
MoE
中提出,然後在23年3月 GPT-4 釋出後火了一把,因為駭客的小道訊息傳其使用了 MoE 架構,之後23年12月 Mistral AI 釋出了首個開源的 MoE 架構模型 Mixtral-8x7B [2],接著24年1月 DeepSeek 釋出了國內首個開源的 MoE 架構模型 DeepSeekMoE [3]。
Agentic
Sora
GraphRAG
GPT-4o
o1
接下來是和 o1 相關的熱詞,畢竟下半年大家都在研究 o1
ORM;PRM
train-time compute;test-time compute
Inference Scaling Laws/Test-Time Scaling

MCTS
猜測的 o1 推理正規化:SA,MR,DC,SR,CI,EC
系統分析Systematic Analysis(SA)方法重用Method Reuse(MR)分而治之Divide and Conquer(DC)自我改進Self-Refinement(SR)上下文識別Context Identification (CI)強化約束Emphasizing Constraints(EC)
接下來是幾個"self"
Self-Play
Self-Rewarding
Self-Correct/Correction
Self-Refine
Self-Reflection
Self-Consistency
RFT
Today, we're excited to introduce a new way of model customization for our O1 series: reinforcement fine-tuning, or RFT for short.
ReFT
下面是幾個"O"
PPO
DPO
GRPO
幾個比較常見且已經有實現的"O"
ORPO
KTO
SimPO
RLOO
-
https://www.cs.toronto.edu/~hinton/absps/jjnh91.pdf -
https://arxiv.org/pdf/2401.04088 -
https://arxiv.org/pdf/2401.06066 -
https://arxiv.org/pdf/2412.19437 -
https://huggingface.co/deepseek-ai/DeepSeek-V3 -
https://lilianweng.github.io/posts/2023-06-23-agent/ -
https://cdn.openai.com/papers/practices-for-governing-agentic-ai-systems.pdf -
https://www.anthropic.com/research/building-effective-agents -
https://arxiv.org/pdf/2404.16130 -
https://arxiv.org/abs/2005.11401 -
https://arxiv.org/pdf/2305.20050 -
https://openai.com/index/learning-to-reason-with-llms/ -
https://arxiv.org/pdf/2408.00724 -
http://ggp.stanford.edu/readings/uct.pdf -
https://arxiv.org/pdf/2410.13639 -
https://arxiv.org/pdf/2408.01072 -
https://arxiv.org/pdf/2401.10020 -
https://arxiv.org/pdf/2409.12917 -
https://arxiv.org/pdf/2303.17651 -
https://arxiv.org/pdf/2405.06682 -
https://arxiv.org/pdf/2310.11511 -
https://arxiv.org/pdf/2310.06271 -
https://arxiv.org/pdf/2203.11171 -
https://www.youtube.com/watch?v=yCIYS9fx56U -
https://openai.com/form/rft-research-program/ -
https://arxiv.org/pdf/2401.08967 -
https://arxiv.org/pdf/1707.06347 -
https://arxiv.org/pdf/2305.18290 -
https://arxiv.org/pdf/2402.03300 -
https://arxiv.org/pdf/2403.07691 -
https://arxiv.org/pdf/2402.01306 -
https://arxiv.org/pdf/2405.14734 -
https://arxiv.org/pdf/2402.14740

掃描二維碼新增小助手微信
關於我們
